Servers randomly failing to authenticate correctly with AD

Hello,

I'm running into a very strange issue with my work network. About two months ago we migrated all of our domain controllers to server 2012 R2. Everything has been running perfectly fine until recently.

Our current setup involves two Dell R720 Windows Server 2012 R2 Hyper-V hosts, each host has a DC Virtual Machine. DC1 is on VHOST1 and DC2 is on VHOST2.

On the weekend of the September 27th, we ran into a problem with the NIC card on VHost1. Four days in a row, the host lost all connections to the network. We eventually moved the servers network connection to a separate PCI network card and that solved that problem. I'm unsure of this is related to the issue but I want to mention in case it's potentially part of the problem.

Unfortunately since Sunday morning we've been running into some bizarre issues. On Sunday morning, one of our servers completely stopped wanting to authenticate with the active directory. Every password I tried gave back the error "Username or password is incorrect" even though the username/pass combination was 100% correct and it worked on other machines. A restart of that server fixed the problem and authentication started working again.

Yesterday morning, DC1 started doing the same thing. No matter what credentials I gave it, it told me they were wrong. I restarted DC1, and again, the problem went away and authentication succeeded once again.

Last night, another server (different than the other two) stopped authenticating like the others did. Luckily I was able to use local credentials to get into the server restart it. Once again, restart fixed the problem.

I've checked the event logs and everything looks 100% fine except my failed logins. I've attached a few examples of what might point out a problem. But the errors in the event logs are very uncommon. The error about replication problems is only about once or twice a day, which I don't personally see as the problem.

DCDiag shows only one problem but it seems unrelated to the issue:

      Starting test: FrsEvent

         There are warning or error events within the last 24 hours after the

         SYSVOL has been shared.  Failing SYSVOL replication problems may cause

         Group Policy problems.

I've attached the log from dcdiag as well.

Replication seems to be operating smoothly between the two DC's.

I'm very confused as to what could be causing these problems.

Anyone have any ideas? Have you run into these problems before?

Any help would be much appreciated.

Thank you,

John Caspary
Event-Log-AD-DS-1.txt
Event-Log-AD-DS-2.txt
Event-Log-Failed-Login-SunMorning.txt
Event-Log-Failed-Login1.txt
DC-DIAG.txt
JohnMan777Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Chris DentPowerShell DeveloperCommented:
Hmm can you tell us a bit about your DNS settings (TCP/IP configuration, both DCs)?

And, if you open the DNS console, can you tell us which zones you can see? Please also detail whether or not you have an _msdcs.businessname.local zone, or you see an _msdcs folder underneath businessname.local.

You have a lot of (apparently) missing DNS records in the log.

Authentication depends on correct DNS configuration, I'd hesitate to explore anything else until the state of the dependency is clear.

Chris
0
compdigit44Commented:
Can you post the results of the following:

repadmn /showrepl

1) Is the time correct?
2) By any chance is the new NIC mac address listed for a reservation in DHCP?
3) Any errors on the switch port the nic is connected to?
0
JohnMan777Author Commented:
Chris,

Thank you for the tip. I've cleaned up the DNS and the tests are coming back passing 100%.

I'll keep an eye on things for the next few days and see if it resolves the problem.

Thank you,

John Caspary
0
Making Bulk Changes to Active Directory

Watch this video to see how easy it is to make mass changes to Active Directory from an external text file without using complicated scripts.

JohnMan777Author Commented:
Chris,

The issues unfortunately repeated themselves again this morning. At about midnight, DC2 was locked out. About 2 hours later, another server did the same.

Same error when trying to log in "Invalid Username or Password".

This morning I turned off Kerberos Authentication on DC2 (luckily I was already logged into the server) and authentication (although very slow) started working. Turned Kerberos services back on, authentication failed again.

I restarted DC2 and everything started working normally again but unfortunately the other server (RIP6) didn't fix itself. I had to restart it.

I'm pretty stumped.

compdigit44, I've attached the results. The logs were pulled right after DC2 was restarted so maybe the errors could signify something?

Timing is 100% sync'd between both severs. NIC isn't reserved and no switch port errors.

Thanks,

John
Repadmin-DC1.txt
Repadmin-DC2.txt
0
compdigit44Commented:
Can you please upload screen of our AD DNS Structure.. You and block out any private information of course.

See see errors in one your repadmin upload..

Can you please run the follow commmand on DC1 and DC2

dcdiag /test:checksecurityerror
0
JohnMan777Author Commented:
How would you prefer I do that? Screenshot of what's in _msdcs. ?
0
compdigit44Commented:
Yes... as much info as you can up load would be great...
0
JohnMan777Author Commented:
Compdigig44,

Please see the attached DNS export.

Thanks,

John
DNS-Export-msdcs.txt
0
Chris DentPowerShell DeveloperCommented:
You don't take snapshots do you? (No is good).

Otherwise it would be great to see if there are any events in the other logs too (both DCs).

Chris
0
Chris DentPowerShell DeveloperCommented:
_msdcs looks fine. Can we have the service records from the parent zone too?

Chris
0
JohnMan777Author Commented:
I'm sorry. What exactly do you mean by service records from the parent zone?

Thanks,

John
0
JohnMan777Author Commented:
I ran an extended diagnostic test on the DC. Maybe this could help?

Please see attached.

Thank you,

John Caspary
dcdiag-extended.txt
0
Chris DentPowerShell DeveloperCommented:
That's a remarkably, and frustratingly, clean report. You don't often see them like that.

Perhaps check the log files in c:\Windows\debug and event logs again (around the time of the failure), including security.

If that fails, we can always increase logging. Bit tricky sorting the good from the bad if we do though.
0
compdigit44Commented:
Your repadmin output had the following error "The target principal name is incorrect"..

Which could point to DNS, Computer Account problems (kerbose) or SPN problems...

Have you read the following EE article? http://www.experts-exchange.com/Database/MS-SQL-Server/Q_28417538.html
0
JohnMan777Author Commented:
Chris,

I looked deeper into it and I can't find anything that sticks out. In fact the "Directory service" logs almost shows nothing.

10/9/2014 12:23AM - Internal event: The Address Book hierarchy table has been rebuilt.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation is beginning a full pass on database 'C:\Windows\NTDS\ntds.dit'.
10/9/2014 12:39AM - NTDS (760) NTDSA: Online defragmentation has completed a full pass on database 'C:\Windows\NTDS\ntds.dit', freeing 4 pages. This pass started on 10/9/2014 and ran for a total of 0 seconds, requiring 1 invocations over 1 days. Since the database was created it has been fully defragmented 61 times.
10/9/2014 4:31AM - NTDS (760) NTDSA: Shadow copy instance 16 freeze started.

In the System Log, there are a lot of errors in regards to authentication while the machine was locked out. That's not surprising though.
0
JohnMan777Author Commented:
compdigit44,

I've attached the SPN list for the two DC's. Do you see anything that sticks out?

Thanks,

John
DC1-SPN-LIST.txt
DC2-SPN-LIST.txt
0
JohnMan777Author Commented:
This issue was never solved.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
JohnMan777Author Commented:
d
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2012

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.