Solved

I will offer Points for even good suggestions, if not fixes.. (Server 2k8X64 Active Directory and possibly Desktop Authentication issues

Posted on 2014-12-03
22
275 Views
Last Modified: 2014-12-10
Ok,

So here is the dilemma, which i honestly have been on for the past month, and can't seem to track down the issue. It hasn't been just me either, there was a support case from Microsoft created, and they can't seem to figure this out either, so if i get no replies here i understand why. (open to suggestions)

The issue is that the Domain logon server itself is disappearing, which in a high load environment i would understand and think this is likely an issue with over utilization.. The environment here is 34 people, and at any given time there is only 20 - 25 in the office. But, for whatever reason, i am getting clients disconnecting from the domain controller. I had created a Domain policy to try to force the systems to wait for the network, thinking this would at minimal point out if this is a network issue, or a server issue.. But the Policy setting doesn't seem to make a difference.. (Wait for network and require domain controller to log into the PC)

The DC is a 2k8 server x64 running 64GB of DDR2 Registered memory

It has 2 8 core Xenon processors

It is a Raid6 with SSD's that do 6GBPS

The server is fully patched, all drivers NIC and all have been updated, there is no NIC teaming.

The server is rebooted 4 times a year, once every 3 months.

The network is a Gigabit network using Cat6 and Juniper hardware

There is no internal firewalls and the servers do not run an AV.

The Desktops are all windows 7 X64 and vary as to hardware, with the worst being Core2's and the best being I7's.

Memory varies from 4GB - 16GB

Hard disks are all at least 250Gb and SATA 7200RPM

All the windows 7 machines are updated weekly with WSUS

All the 7 machines have an AV installed, all the same, all have the Firewall turned off and network discovery turned on.

They were all clean builds never using ghost or any cloning.

I haven't been able to track down where the issue is coming from, has anyone else seen this issue?
LogonServerNA.png
LogonServer-NA2.png
0
Comment
Question by:Rob G
  • 10
  • 9
  • 2
  • +1
22 Comments
 
LVL 53

Expert Comment

by:McKnife
ID: 40479652
Hi.

Let's work on the error description a little. I bet it will be very easy to find out.
->at what point in time does it disconnect (=show no logon server variable content)? Please setup an alert task that informs you when that happens. (query the variable and if not \\dc, send a mail together with echo %time% and an info on uptime (can be read out scripted using psinfo).
->does it happen with clean systems (no software on them, just a completely naked, domain joined win7 with no policies applied) at all?
0
 
LVL 13

Assisted Solution

by:Greg Hejl
Greg Hejl earned 245 total points
ID: 40480696
Do you have any network monitoring in place?  PRTG has a free version you can point at the server.

maybe the physical network is having issues, have you run PING over night?

Replace any patch cords from the switch to the server.

The next place to look would be in the event logs.

Is the DC a Virtual Machine?  The hardware is a bit overkill for one DC,  is it running multiple services?
0
 
LVL 6

Author Comment

by:Rob G
ID: 40480697
Knife,
It seems to happen on machines sporadically, There are currently about 8 machines with this issue, as of today, but yesterday there were 9 machines, so it appears that one had corrected itself with logon today. The error logs on the domain controller show nothing out of the ordinary, as in nothing that would make me think the issue is on the DC side. The DC errors that are shown are..

Failed extract of third-party root list from auto update cab at: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: A required certificate is not within its validity period when verifying against the current system clock or the timestamp in the signed file.

Event ID 11
Source CAPI2

--and--

System

Driver xxx required for printer xxx is unknown contact the administrator to install the driver before you log in again..

Event ID 1111
Source Terminal Service Printers

Both of these is assume were null errors..
As the Application one i assumed was due to the system not getting the updated Information from windows update, since the server is segregated and the system error from people logging into the server (admins) who have the printer thing checked in RDS.


The systems (Clients) never show no logon server, they log in fine, but only show it in the information of the system, or if you query the logon server from the client side. Which is confusing, since the policy again, is set to not let the clients log into the desktop without the DC being available. But at no time does the desktop fail to log into the domain. The Policy is set, i can see it in the desktop, but again.. it's got me baffled.

My desktop, which does not have the issue, is on the same switch, has the same software, plus additional administrator tools, and logs into the same domain. The only difference between my desktop and the others that i can tell is that i log into it with an administrative account. But there are many other machines here without the issue logged into with a user account, so i can't seem to figure out the difference.. Same policy and all.. I added myself to the policy when i noticed the error occurring, which i figured if it were policy based i would eventually have the same issue.. Just luck i guess.. LOL
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40480703
Do you have NTP set up as a GPO?  all the end devices on the correct time?
0
 
LVL 6

Author Comment

by:Rob G
ID: 40480718
Greg,
No it is physical, it was a left over after we migrated all the main servers to a HyperV based virtual server farm. The "Overkill" server was just an older Dell Server we had to use as the DC, which was upgraded from 2k3 about a year ago.. I can't tell if the issue started around then or not, since i was not working here during that time.. It is possible that something is jacked from that time..
0
 
LVL 6

Author Comment

by:Rob G
ID: 40480726
Greg,
No and yes.. No NTP source internal set through group policy.. Although i have contemplated testing that out to see if that would help.. But the clocks all look to be the correct time.. They are pretty much spot on too.. Like change in time of about .5 seconds when the minute changes between the desktops and the server.. At most i have seen is on one of the Old.. I mean REALLY Old machines the time can get out of sync as much as 20 seconds.. but the threshold should be set for 30 seconds, and ironically that machine doesn't have the issue.
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40480750
Was the DC a fresh install of 2008R2 then DCPromo'd into the domain?
0
 
LVL 6

Author Comment

by:Rob G
ID: 40480780
Greg,
No it was an "upgrade" which as far as i can tell was originally NT4 Directory services, migrated to 2000 Advanced server AD, migrated to 2003 Standard AD, migrated to 2k3R2x64 AD, migrated to 2008 x64 standard AD... Which is where it currently sits..
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40480807
you have other DC's running on the network?
0
 
LVL 11

Assisted Solution

by:hecgomrec
hecgomrec earned 50 total points
ID: 40480895
Please check in all your stations that NIC stay ON all the time.

The machine may turn off the network card to save power... remove the settings from the NIC using the device manager.

Make sure you don't have DNS problems.
0
 
LVL 6

Author Comment

by:Rob G
ID: 40480929
Greg,
No just a single Domain controller, for roughly 25 end users.. But they log in at varying times.. typically from 7-9AM with each dept of about 5-10 people..


Hec,
Power management is disabled through policy, in that there is no hibernation, sleep, and the NIC does not turn off, nor does it allow power saving.. The NIC is set to auto-negotiate but the switch ports are set to full duplex 1GB. All desktops have 1GB full or 10/100/1000 auto.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40480980
You have a Hyper-V environment available?

I would start up a new 2012R2 instance - DCPromo - transfer all roles to it.  Demote the DC in place now. disjoin it from domain,  you will also need to adjust DHCP service to reflect the new DC DNS

tear down the current server - get rid of the raid 6, make it raid 10.  Make the current DC box a hyperV server.  Startup a new 2012R2 VM - dc promo it so you have minimum two DC's in place (this is best practice)

You can use the extra capacity to run a network monitor
0
 
LVL 53

Assisted Solution

by:McKnife
McKnife earned 205 total points
ID: 40481036
It never has the variable %logonserver% populated at all? That has to be something very basic...
If you are not joined to a domain or if no logonserver was available, that variable would show the local computername as logonserver - never would it be empty. Did you google for that symptom "logonserver variable empty" yet?
0
 
LVL 6

Author Comment

by:Rob G
ID: 40481129
Knife,
Yes, i actually had paid Microsoft to connect remotely to it and figure it out.. that ticket was open about 60 days ago, and i have yet to get any kind of usable solution from them, they can't figure it out either.. LOL..

I did google the hell out if it, thinking i would find something..
It's not constant.. One day you will get no logon server, but if you log off, and back on, after clicking switch user, and doing domain\username and password it will show the logon server as the correct server for about 7-10 days, then it is gone again.. The Other odd thing is that if you do the same thing, but log out of the user and try to log in as domain\administrator it will fail with the error "no logon servers available".. Which made me think.. "Cached profile" but when i deleted the cache, and tried again, it worked for the user and not the admin, so i pulled the machine, moved it to another network port and tried again.. still the same error.. but then the next day, both accounts worked fine on that system.. No updates, no physical changes anywhere...

Greg,
I actually have another system with the DC cloned on a different Vlan, with a few test VM desktops connected to it.. I can't replicate it on the other side.. which is weird since everything is identical.. except the OS is on a 2k8R2 Hyper-V instead of physical.. and the switches are ancient Cisco's.

That made me think..
I am pretty sure the test side i changed the DNS Domain Name in the DHCP settings..
Which makes me wonder if maybe this is a DNS/DHCP issue..?

Does anyone who knows the Scope Options on here know what the 015 Scope option will do if it is not correct, or set to a secondary DNS name rather then the current DNS name of the Domain name?

For instance if the domain was originally test1.com
and you change it to Test2.local
but do not updated 015 in the scope options of the DHCP list, will that cause issues?
Does anyone have a good link to what that 015 option does, or what it controls?
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40481451
Cloning DC's is never a good thing....

Scope option 15 should have you Active Directory name in it

DNS option should point to your two Active Directory servers.  This could also be an issue with desktops not finding logon server - DNS must be the Active Directory servers.

Make sure you do not have two DHCP servers on your network (unless it's 2012R2 DHCP failover) - this can cause logon issues too
0
 
LVL 6

Author Comment

by:Rob G
ID: 40488784
Greg, Sorry for the delay in getting back..
The DNS is on the current AD server, while there are two servers, there is only one in each Vlan, so while there are two servers, only one on each side is visible. There isn't any holes punched through on either Vlan, so there shouldn't be any DHCP duplication, although even if that were the case, they are on two completely different subnets, so i don't think that would be the issue.
The 015 is set to the old DNS name, which is not the same as the new one, so i am curious if this could be the culprit, but i don't know enough about the Scope options and what the thought was on the configuration being the old DNS name in that option, but i have a feeling that changing that to the correct DNS name would be the solution.. Thoughts?

Any idea what changing that could jack up in the current environment?
Also, i found that they had once had WINS in the mix, which is long gone, and i removed the traces of that on Saturday.

Thanks so far for the Q/A's
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40489159
Have you done an overnight ping test from one or more of the problem machines to the DC.

Have you started up a new VM to replace this DC?

Is the 'cloned' dc completely separated from the production network?
(the production DC isn't trying to replicate to it? or vice-versa)

also try a Message Analyzer capture on one of the problem machines.  This app can break down IP Traffic by conversation with the DC
0
 
LVL 6

Author Comment

by:Rob G
ID: 40491640
Ran a ping last night, along with wireshark, never saw a single issue, no dropped packets, no problems at all.. Also ran a separate wireshark from one of the effected machines, again no issues, never a dropped packet or error.

I don't plan on migrating this to a new VM, If i can't get this to work, i am going to buy a new physical server, and start from scratch, as i would rather dump the time into getting this to work correctly, then migrate a problem from one piece of hardware to another.

The cloned server is in no way connected to the live network.
There is no replication setup.. outside of Veam to send data to our offsite data center. Which is configured through VPN, but it is only a one way trust.. and the DC isn't setup to know about it.. it's segregated..
0
 
LVL 13

Accepted Solution

by:
Greg Hejl earned 245 total points
ID: 40491810
I didn't say to make it a virtual machine...startup a new vm and dcpromo it into your AD.  Use DCHP to tell the other computers to use this as the primary domain controller and for DNS.  Leave your original server running.

Do not P2V domain controllers.  DC's have hooks into the hardware tied into the GUID/SID. this is also why cloning isn't done.

this should take care of your issue.

you could also try wiresharking both the DC and problem workstation at the same time as the issue occurs.
0
 
LVL 6

Author Closing Comment

by:Rob G
ID: 40492221
I actually figured this out about 30min ago.. Turns out it is not an issue on the DC side at all. The issue appears to be with the switch that these machines are plugged into. There are 2 Juniper gigabit switches and an old 10mb switch that i was told no one was connected to, but it seems that the one machines with this issue are actually connected to this switch. I have since moved them to the other switch, had them log into the machines and the issues are gone.. Anyone know a good home for a 20 year old Cisco switch? lol
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40492579
glad you found the network issue!  

thats why I kept going back to wireshark to troubleshoot!

Enjoy your win!
0
 
LVL 13

Expert Comment

by:Greg Hejl
ID: 40492677
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

OfficeMate Freezes on login or does not load after login credentials are input.
The recent Microsoft changes on update philosophy for Windows pre-10 and their impact on existing WSUS implementations.
This Micro Tutorial will teach you the basics of configuring your computer to improve its speed. It will also teach you how to disable programs that are running in the background simultaneously. This will be demonstrated using Windows 7 operating…
This Micro Tutorial will teach you how to the overview of Microsoft Security Essentials. This is a free anti-virus software that guards your PC against viruses, spyware, worms, and other malicious software. This will be demonstrated using Windows…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now