Solved

Servers randomly failing to authenticate correctly with AD

Posted on 2014-10-07
18
161 Views
Last Modified: 2016-11-23
Hello,

I'm running into a very strange issue with my work network. About two months ago we migrated all of our domain controllers to server 2012 R2. Everything has been running perfectly fine until recently.

Our current setup involves two Dell R720 Windows Server 2012 R2 Hyper-V hosts, each host has a DC Virtual Machine. DC1 is on VHOST1 and DC2 is on VHOST2.

On the weekend of the September 27th, we ran into a problem with the NIC card on VHost1. Four days in a row, the host lost all connections to the network. We eventually moved the servers network connection to a separate PCI network card and that solved that problem. I'm unsure of this is related to the issue but I want to mention in case it's potentially part of the problem.

Unfortunately since Sunday morning we've been running into some bizarre issues. On Sunday morning, one of our servers completely stopped wanting to authenticate with the active directory. Every password I tried gave back the error "Username or password is incorrect" even though the username/pass combination was 100% correct and it worked on other machines. A restart of that server fixed the problem and authentication started working again.

Yesterday morning, DC1 started doing the same thing. No matter what credentials I gave it, it told me they were wrong. I restarted DC1, and again, the problem went away and authentication succeeded once again.

Last night, another server (different than the other two) stopped authenticating like the others did. Luckily I was able to use local credentials to get into the server restart it. Once again, restart fixed the problem.

I've checked the event logs and everything looks 100% fine except my failed logins. I've attached a few examples of what might point out a problem. But the errors in the event logs are very uncommon. The error about replication problems is only about once or twice a day, which I don't personally see as the problem.

DCDiag shows only one problem but it seems unrelated to the issue:

      Starting test: FrsEvent

         There are warning or error events within the last 24 hours after the

         SYSVOL has been shared.  Failing SYSVOL replication problems may cause

         Group Policy problems.

I've attached the log from dcdiag as well.

Replication seems to be operating smoothly between the two DC's.

I'm very confused as to what could be causing these problems.

Anyone have any ideas? Have you run into these problems before?

Any help would be much appreciated.

Thank you,

John Caspary
Event-Log-AD-DS-1.txt
Event-Log-AD-DS-2.txt
Event-Log-Failed-Login-SunMorning.txt
Event-Log-Failed-Login1.txt
DC-DIAG.txt
0
Comment
Question by:JohnMan777
  • 10
  • 4
  • 4
18 Comments
 
LVL 70

Expert Comment

by:Chris Dent
ID: 40367745
Hmm can you tell us a bit about your DNS settings (TCP/IP configuration, both DCs)?

And, if you open the DNS console, can you tell us which zones you can see? Please also detail whether or not you have an _msdcs.businessname.local zone, or you see an _msdcs folder underneath businessname.local.

You have a lot of (apparently) missing DNS records in the log.

Authentication depends on correct DNS configuration, I'd hesitate to explore anything else until the state of the dependency is clear.

Chris
0
 
LVL 19

Expert Comment

by:compdigit44
ID: 40369228
Can you post the results of the following:

repadmn /showrepl

1) Is the time correct?
2) By any chance is the new NIC mac address listed for a reservation in DHCP?
3) Any errors on the switch port the nic is connected to?
0
 

Author Comment

by:JohnMan777
ID: 40369233
Chris,

Thank you for the tip. I've cleaned up the DNS and the tests are coming back passing 100%.

I'll keep an eye on things for the next few days and see if it resolves the problem.

Thank you,

John Caspary
0
 

Author Comment

by:JohnMan777
ID: 40370961
Chris,

The issues unfortunately repeated themselves again this morning. At about midnight, DC2 was locked out. About 2 hours later, another server did the same.

Same error when trying to log in "Invalid Username or Password".

This morning I turned off Kerberos Authentication on DC2 (luckily I was already logged into the server) and authentication (although very slow) started working. Turned Kerberos services back on, authentication failed again.

I restarted DC2 and everything started working normally again but unfortunately the other server (RIP6) didn't fix itself. I had to restart it.

I'm pretty stumped.

compdigit44, I've attached the results. The logs were pulled right after DC2 was restarted so maybe the errors could signify something?

Timing is 100% sync'd between both severs. NIC isn't reserved and no switch port errors.

Thanks,

John
Repadmin-DC1.txt
Repadmin-DC2.txt
0
 
LVL 19

Expert Comment

by:compdigit44
ID: 40371111
Can you please upload screen of our AD DNS Structure.. You and block out any private information of course.

See see errors in one your repadmin upload..

Can you please run the follow commmand on DC1 and DC2

dcdiag /test:checksecurityerror
0
 

Author Comment

by:JohnMan777
ID: 40371119
How would you prefer I do that? Screenshot of what's in _msdcs. ?
0
 
LVL 19

Expert Comment

by:compdigit44
ID: 40371129
Yes... as much info as you can up load would be great...
0
 

Author Comment

by:JohnMan777
ID: 40371143
Compdigig44,

Please see the attached DNS export.

Thanks,

John
DNS-Export-msdcs.txt
0
 
LVL 70

Expert Comment

by:Chris Dent
ID: 40371146
You don't take snapshots do you? (No is good).

Otherwise it would be great to see if there are any events in the other logs too (both DCs).

Chris
0
 
LVL 70

Expert Comment

by:Chris Dent
ID: 40371151
_msdcs looks fine. Can we have the service records from the parent zone too?

Chris
0
 

Author Comment

by:JohnMan777
ID: 40371186
I'm sorry. What exactly do you mean by service records from the parent zone?

Thanks,

John
0
 

Author Comment

by:JohnMan777
ID: 40371223
I ran an extended diagnostic test on the DC. Maybe this could help?

Please see attached.

Thank you,

John Caspary
dcdiag-extended.txt
0
 
LVL 70

Expert Comment

by:Chris Dent
ID: 40371240
That's a remarkably, and frustratingly, clean report. You don't often see them like that.

Perhaps check the log files in c:\Windows\debug and event logs again (around the time of the failure), including security.

If that fails, we can always increase logging. Bit tricky sorting the good from the bad if we do though.
0
 
LVL 19

Expert Comment

by:compdigit44
ID: 40371585
Your repadmin output had the following error "The target principal name is incorrect"..

Which could point to DNS, Computer Account problems (kerbose) or SPN problems...

Have you read the following EE article? http://www.experts-exchange.com/Database/MS-SQL-Server/Q_28417538.html
0
 

Author Comment

by:JohnMan777
ID: 40371991
Chris,

I looked deeper into it and I can't find anything that sticks out. In fact the "Directory service" logs almost shows nothing.

10/9/2014 12:23AM - Internal event: The Address Book hierarchy table has been rebuilt.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation is beginning a full pass on database 'C:\Windows\NTDS\ntds.dit'.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation has completed a full pass on database 'C:\Windows\NTDS\ntds.dit', freeing 4 pages. This pass started on 10/9/2014 and ran for a total of 0 seconds, requiring 1 invocations over 1 days. Since the database was created it has been fully defragmented 61 times.
10/9/2014 4:31AM - NTDS (760) NTDSA: Shadow copy instance 16 freeze started.

In the System Log, there are a lot of errors in regards to authentication while the machine was locked out. That's not surprising though.
0
 

Author Comment

by:JohnMan777
ID: 40373622
compdigit44,

I've attached the SPN list for the two DC's. Do you see anything that sticks out?

Thanks,

John
DC1-SPN-LIST.txt
DC2-SPN-LIST.txt
0
 

Accepted Solution

by:
JohnMan777 earned 0 total points
ID: 41571809
This issue was never solved.
0
 

Author Closing Comment

by:JohnMan777
ID: 41577530
d
0

Join & Write a Comment

Understanding the various editions available is vital when you decide to purchase Windows Server 2012. You need to have a basic understanding of the features and limitations in each edition in order to make a well-informed decision that best suits y…
Synchronize a new Active Directory domain with an existing Office 365 tenant
In this Micro Tutorial viewers will learn how to use Windows Server Backup to create full image of their system. Tutorial shows how to install Windows Server Backup Feature on Windows 2012R2 and how to configure scheduled Bare Metal Recovery backup.…
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now