Solved

Servers randomly failing to authenticate correctly with AD

Posted on 2014-10-07
18
289 Views
Last Modified: 2016-11-23
Hello,

I'm running into a very strange issue with my work network. About two months ago we migrated all of our domain controllers to server 2012 R2. Everything has been running perfectly fine until recently.

Our current setup involves two Dell R720 Windows Server 2012 R2 Hyper-V hosts, each host has a DC Virtual Machine. DC1 is on VHOST1 and DC2 is on VHOST2.

On the weekend of the September 27th, we ran into a problem with the NIC card on VHost1. Four days in a row, the host lost all connections to the network. We eventually moved the servers network connection to a separate PCI network card and that solved that problem. I'm unsure of this is related to the issue but I want to mention in case it's potentially part of the problem.

Unfortunately since Sunday morning we've been running into some bizarre issues. On Sunday morning, one of our servers completely stopped wanting to authenticate with the active directory. Every password I tried gave back the error "Username or password is incorrect" even though the username/pass combination was 100% correct and it worked on other machines. A restart of that server fixed the problem and authentication started working again.

Yesterday morning, DC1 started doing the same thing. No matter what credentials I gave it, it told me they were wrong. I restarted DC1, and again, the problem went away and authentication succeeded once again.

Last night, another server (different than the other two) stopped authenticating like the others did. Luckily I was able to use local credentials to get into the server restart it. Once again, restart fixed the problem.

I've checked the event logs and everything looks 100% fine except my failed logins. I've attached a few examples of what might point out a problem. But the errors in the event logs are very uncommon. The error about replication problems is only about once or twice a day, which I don't personally see as the problem.

DCDiag shows only one problem but it seems unrelated to the issue:

      Starting test: FrsEvent

         There are warning or error events within the last 24 hours after the

         SYSVOL has been shared.  Failing SYSVOL replication problems may cause

         Group Policy problems.

I've attached the log from dcdiag as well.

Replication seems to be operating smoothly between the two DC's.

I'm very confused as to what could be causing these problems.

Anyone have any ideas? Have you run into these problems before?

Any help would be much appreciated.

Thank you,

John Caspary
Event-Log-AD-DS-1.txt
Event-Log-AD-DS-2.txt
Event-Log-Failed-Login-SunMorning.txt
Event-Log-Failed-Login1.txt
DC-DIAG.txt
0
Comment
Question by:JohnMan777
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 4
  • 4
18 Comments
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40367745
Hmm can you tell us a bit about your DNS settings (TCP/IP configuration, both DCs)?

And, if you open the DNS console, can you tell us which zones you can see? Please also detail whether or not you have an _msdcs.businessname.local zone, or you see an _msdcs folder underneath businessname.local.

You have a lot of (apparently) missing DNS records in the log.

Authentication depends on correct DNS configuration, I'd hesitate to explore anything else until the state of the dependency is clear.

Chris
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40369228
Can you post the results of the following:

repadmn /showrepl

1) Is the time correct?
2) By any chance is the new NIC mac address listed for a reservation in DHCP?
3) Any errors on the switch port the nic is connected to?
0
 

Author Comment

by:JohnMan777
ID: 40369233
Chris,

Thank you for the tip. I've cleaned up the DNS and the tests are coming back passing 100%.

I'll keep an eye on things for the next few days and see if it resolves the problem.

Thank you,

John Caspary
0
Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 

Author Comment

by:JohnMan777
ID: 40370961
Chris,

The issues unfortunately repeated themselves again this morning. At about midnight, DC2 was locked out. About 2 hours later, another server did the same.

Same error when trying to log in "Invalid Username or Password".

This morning I turned off Kerberos Authentication on DC2 (luckily I was already logged into the server) and authentication (although very slow) started working. Turned Kerberos services back on, authentication failed again.

I restarted DC2 and everything started working normally again but unfortunately the other server (RIP6) didn't fix itself. I had to restart it.

I'm pretty stumped.

compdigit44, I've attached the results. The logs were pulled right after DC2 was restarted so maybe the errors could signify something?

Timing is 100% sync'd between both severs. NIC isn't reserved and no switch port errors.

Thanks,

John
Repadmin-DC1.txt
Repadmin-DC2.txt
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40371111
Can you please upload screen of our AD DNS Structure.. You and block out any private information of course.

See see errors in one your repadmin upload..

Can you please run the follow commmand on DC1 and DC2

dcdiag /test:checksecurityerror
0
 

Author Comment

by:JohnMan777
ID: 40371119
How would you prefer I do that? Screenshot of what's in _msdcs. ?
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40371129
Yes... as much info as you can up load would be great...
0
 

Author Comment

by:JohnMan777
ID: 40371143
Compdigig44,

Please see the attached DNS export.

Thanks,

John
DNS-Export-msdcs.txt
0
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40371146
You don't take snapshots do you? (No is good).

Otherwise it would be great to see if there are any events in the other logs too (both DCs).

Chris
0
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40371151
_msdcs looks fine. Can we have the service records from the parent zone too?

Chris
0
 

Author Comment

by:JohnMan777
ID: 40371186
I'm sorry. What exactly do you mean by service records from the parent zone?

Thanks,

John
0
 

Author Comment

by:JohnMan777
ID: 40371223
I ran an extended diagnostic test on the DC. Maybe this could help?

Please see attached.

Thank you,

John Caspary
dcdiag-extended.txt
0
 
LVL 71

Expert Comment

by:Chris Dent
ID: 40371240
That's a remarkably, and frustratingly, clean report. You don't often see them like that.

Perhaps check the log files in c:\Windows\debug and event logs again (around the time of the failure), including security.

If that fails, we can always increase logging. Bit tricky sorting the good from the bad if we do though.
0
 
LVL 20

Expert Comment

by:compdigit44
ID: 40371585
Your repadmin output had the following error "The target principal name is incorrect"..

Which could point to DNS, Computer Account problems (kerbose) or SPN problems...

Have you read the following EE article? http://www.experts-exchange.com/Database/MS-SQL-Server/Q_28417538.html
0
 

Author Comment

by:JohnMan777
ID: 40371991
Chris,

I looked deeper into it and I can't find anything that sticks out. In fact the "Directory service" logs almost shows nothing.

10/9/2014 12:23AM - Internal event: The Address Book hierarchy table has been rebuilt.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation is beginning a full pass on database 'C:\Windows\NTDS\ntds.dit'.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation has completed a full pass on database 'C:\Windows\NTDS\ntds.dit', freeing 4 pages. This pass started on 10/9/2014 and ran for a total of 0 seconds, requiring 1 invocations over 1 days. Since the database was created it has been fully defragmented 61 times.
10/9/2014 4:31AM - NTDS (760) NTDSA: Shadow copy instance 16 freeze started.

In the System Log, there are a lot of errors in regards to authentication while the machine was locked out. That's not surprising though.
0
 

Author Comment

by:JohnMan777
ID: 40373622
compdigit44,

I've attached the SPN list for the two DC's. Do you see anything that sticks out?

Thanks,

John
DC1-SPN-LIST.txt
DC2-SPN-LIST.txt
0
 

Accepted Solution

by:
JohnMan777 earned 0 total points
ID: 41571809
This issue was never solved.
0
 

Author Closing Comment

by:JohnMan777
ID: 41577530
d
0

Featured Post

Free NetCrunch network monitor licenses!

Only on Experts-Exchange: Sign-up for a free-trial and we'll send you your permanent license!

Here is what you get: 30 Nodes | Unlimited Sensors | No Time Restrictions | Absolutely FREE!

Act now. This offer ends July 14, 2017.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A project that enables an administrator to perform actions within a user session context not just at the time of login but any time later on day(s) or week(s) later.
Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
In this Micro Tutorial viewers will learn how to use Boot Corrector from Paragon Rescue Kit Free to identify and fix the boot problems of Windows 7/8/2012R2 etc. As an example is used Windows 2012R2 which lost its active partition flag (often happen…
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question