?
Solved

Servers randomly failing to authenticate correctly with AD

Posted on 2014-10-07
18
Medium Priority
?
330 Views
Last Modified: 2016-11-23
Hello,

I'm running into a very strange issue with my work network. About two months ago we migrated all of our domain controllers to server 2012 R2. Everything has been running perfectly fine until recently.

Our current setup involves two Dell R720 Windows Server 2012 R2 Hyper-V hosts, each host has a DC Virtual Machine. DC1 is on VHOST1 and DC2 is on VHOST2.

On the weekend of the September 27th, we ran into a problem with the NIC card on VHost1. Four days in a row, the host lost all connections to the network. We eventually moved the servers network connection to a separate PCI network card and that solved that problem. I'm unsure of this is related to the issue but I want to mention in case it's potentially part of the problem.

Unfortunately since Sunday morning we've been running into some bizarre issues. On Sunday morning, one of our servers completely stopped wanting to authenticate with the active directory. Every password I tried gave back the error "Username or password is incorrect" even though the username/pass combination was 100% correct and it worked on other machines. A restart of that server fixed the problem and authentication started working again.

Yesterday morning, DC1 started doing the same thing. No matter what credentials I gave it, it told me they were wrong. I restarted DC1, and again, the problem went away and authentication succeeded once again.

Last night, another server (different than the other two) stopped authenticating like the others did. Luckily I was able to use local credentials to get into the server restart it. Once again, restart fixed the problem.

I've checked the event logs and everything looks 100% fine except my failed logins. I've attached a few examples of what might point out a problem. But the errors in the event logs are very uncommon. The error about replication problems is only about once or twice a day, which I don't personally see as the problem.

DCDiag shows only one problem but it seems unrelated to the issue:

      Starting test: FrsEvent

         There are warning or error events within the last 24 hours after the

         SYSVOL has been shared.  Failing SYSVOL replication problems may cause

         Group Policy problems.

I've attached the log from dcdiag as well.

Replication seems to be operating smoothly between the two DC's.

I'm very confused as to what could be causing these problems.

Anyone have any ideas? Have you run into these problems before?

Any help would be much appreciated.

Thank you,

John Caspary
Event-Log-AD-DS-1.txt
Event-Log-AD-DS-2.txt
Event-Log-Failed-Login-SunMorning.txt
Event-Log-Failed-Login1.txt
DC-DIAG.txt
0
Comment
Question by:JohnMan777
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 4
  • 4
18 Comments
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40367745
Hmm can you tell us a bit about your DNS settings (TCP/IP configuration, both DCs)?

And, if you open the DNS console, can you tell us which zones you can see? Please also detail whether or not you have an _msdcs.businessname.local zone, or you see an _msdcs folder underneath businessname.local.

You have a lot of (apparently) missing DNS records in the log.

Authentication depends on correct DNS configuration, I'd hesitate to explore anything else until the state of the dependency is clear.

Chris
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40369228
Can you post the results of the following:

repadmn /showrepl

1) Is the time correct?
2) By any chance is the new NIC mac address listed for a reservation in DHCP?
3) Any errors on the switch port the nic is connected to?
0
 

Author Comment

by:JohnMan777
ID: 40369233
Chris,

Thank you for the tip. I've cleaned up the DNS and the tests are coming back passing 100%.

I'll keep an eye on things for the next few days and see if it resolves the problem.

Thank you,

John Caspary
0
Office 365 Training for IT Pros

Learn how to provision tenants, synchronize on-premise Active Directory, implement Single Sign-On, customize Office deployment, and protect your organization with eDiscovery and DLP policies.  Only from Platform Scholar.

 

Author Comment

by:JohnMan777
ID: 40370961
Chris,

The issues unfortunately repeated themselves again this morning. At about midnight, DC2 was locked out. About 2 hours later, another server did the same.

Same error when trying to log in "Invalid Username or Password".

This morning I turned off Kerberos Authentication on DC2 (luckily I was already logged into the server) and authentication (although very slow) started working. Turned Kerberos services back on, authentication failed again.

I restarted DC2 and everything started working normally again but unfortunately the other server (RIP6) didn't fix itself. I had to restart it.

I'm pretty stumped.

compdigit44, I've attached the results. The logs were pulled right after DC2 was restarted so maybe the errors could signify something?

Timing is 100% sync'd between both severs. NIC isn't reserved and no switch port errors.

Thanks,

John
Repadmin-DC1.txt
Repadmin-DC2.txt
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40371111
Can you please upload screen of our AD DNS Structure.. You and block out any private information of course.

See see errors in one your repadmin upload..

Can you please run the follow commmand on DC1 and DC2

dcdiag /test:checksecurityerror
0
 

Author Comment

by:JohnMan777
ID: 40371119
How would you prefer I do that? Screenshot of what's in _msdcs. ?
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40371129
Yes... as much info as you can up load would be great...
0
 

Author Comment

by:JohnMan777
ID: 40371143
Compdigig44,

Please see the attached DNS export.

Thanks,

John
DNS-Export-msdcs.txt
0
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40371146
You don't take snapshots do you? (No is good).

Otherwise it would be great to see if there are any events in the other logs too (both DCs).

Chris
0
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40371151
_msdcs looks fine. Can we have the service records from the parent zone too?

Chris
0
 

Author Comment

by:JohnMan777
ID: 40371186
I'm sorry. What exactly do you mean by service records from the parent zone?

Thanks,

John
0
 

Author Comment

by:JohnMan777
ID: 40371223
I ran an extended diagnostic test on the DC. Maybe this could help?

Please see attached.

Thank you,

John Caspary
dcdiag-extended.txt
0
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40371240
That's a remarkably, and frustratingly, clean report. You don't often see them like that.

Perhaps check the log files in c:\Windows\debug and event logs again (around the time of the failure), including security.

If that fails, we can always increase logging. Bit tricky sorting the good from the bad if we do though.
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40371585
Your repadmin output had the following error "The target principal name is incorrect"..

Which could point to DNS, Computer Account problems (kerbose) or SPN problems...

Have you read the following EE article? http://www.experts-exchange.com/Database/MS-SQL-Server/Q_28417538.html
0
 

Author Comment

by:JohnMan777
ID: 40371991
Chris,

I looked deeper into it and I can't find anything that sticks out. In fact the "Directory service" logs almost shows nothing.

10/9/2014 12:23AM - Internal event: The Address Book hierarchy table has been rebuilt.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation is beginning a full pass on database 'C:\Windows\NTDS\ntds.dit'.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation has completed a full pass on database 'C:\Windows\NTDS\ntds.dit', freeing 4 pages. This pass started on 10/9/2014 and ran for a total of 0 seconds, requiring 1 invocations over 1 days. Since the database was created it has been fully defragmented 61 times.
10/9/2014 4:31AM - NTDS (760) NTDSA: Shadow copy instance 16 freeze started.

In the System Log, there are a lot of errors in regards to authentication while the machine was locked out. That's not surprising though.
0
 

Author Comment

by:JohnMan777
ID: 40373622
compdigit44,

I've attached the SPN list for the two DC's. Do you see anything that sticks out?

Thanks,

John
DC1-SPN-LIST.txt
DC2-SPN-LIST.txt
0
 

Accepted Solution

by:
JohnMan777 earned 0 total points
ID: 41571809
This issue was never solved.
0
 

Author Closing Comment

by:JohnMan777
ID: 41577530
d
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Always backup Domain, SYSVOL etc.using processes according to Microsoft Best Practices. This is meant as a disaster recovery process for small environments that did not implement backup processes and did not run a secondary domain controller that ne…
This process allows computer passwords to be managed and secured without using LAPS. This is an improvement on an existing process, enhanced to store password encrypted, instead of clear-text files within SQL
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question