Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Cannot use Windows Auth (Kerberos) to connect from Failover Cluster nodes

Posted on 2013-01-26
5
582 Views
Last Modified: 2013-01-31
We recently installed updates on two domain controllers at what I will call SITE-1, of a 4 DC Sites/Services configuration and after it was complete we've started to see some strange behavior (it did not present immediately).  Currently we receive the attached error message any time we try to connect to a Failover Cluster SQL instance from a Failover Cluster node (error_sqlconnect) at both SITE-1 and SITE-2 via SSMS.  We have one node member at SITE-2 that currently holds our 'Production' SQL instance which is working, so luckily we are not hard down because of this, but all replication and mirroring has stopped because of, we think, whatever is causing this error.  We have no tolerance if this single node member goes down, then we would be down and my hair would be in hot lava (already on fire).

We use sites and services in AD to control which DCs can be contacted by which servers and those rules are applying properly as far as logging shows for SITE-1 and SITE-2.  I should mention that the Failover Clusters at both SITES are being affected by this.  Interestingly, when we try and start up mirroring and replication services the service accounts are immediately locked out and SSPI failure messages are logged in our SQL logs.  I also want to point out that ALL other functions of

We have an ongoing case open with Microsoft (Sev A) but so far they have been unresponsive and unable to resolve our case.  It was opened 50 hours ago and we have had 4 hours of engineer time and the rest we have simply been waiting.  Please ask any questions you'd like and I'll try to answer what I can (within reason).

Things we have done:

- Ran dcdiag and repadmin /showrepl with no errors.
- Verified DNS at both SITES and made sure that ALL node members and instances resolve properly.
- SQL validation of connections and checked logs
- Validated SQL connections from other servers that were not cluster members
- Validated that SQL can authenticate fine to DCs
- Confirmed that any authentication from cluster nodes cause accounts to lockout and return the SSPI error in SQL and produces the error attached.
- SSPI context tool used to verify that when accounts don't lock out we could get a successful connection
- Restarted cluster nodes (did not help)
- Used Sites and Services to replicate now between all DCs (both SITE-1 and SITE-2)
- Created new service accounts (these are locking out as well)
-

Things we have not done:

- Restarted SITE-2 DCs
- Restarted active cluster node
- Destroyed clusters
- Installed fresh

As I try other things and find additional information I will add it to our ASK.  Thanks in advance for any assistance you smart masses provide.
error-sqlconnect.png
0
Comment
Question by:Propay
  • 2
  • 2
5 Comments
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 38823591
Did the updates you ran (mentioned in the beginning on the article) require a reboot that you've not performed yet? I'm suspicious of those updates, though you did also mention that the behaviour didn't manifest itself immediately after applying them, but some time afterwards - how much time? Since you've got multiple DCs (and presumably multiple GC servers), could you restart them one at a time without any login issues to see if that's related? I'd start there since that's where you applied the updates.

Also, not sure it's the updates, but can you try rolling them back? Just another thought.

And just to confirm the problem, you're getting this error when you connect to any cluster instance from any cluster member (except your one-node cluster)? If you have multiple clusters, do you get this error when connecting to instances on other clusters, or only when  you connect to instances local to the cluster to whose node you're connected? I'm trying to look for some commonality in the issue you're seeing, since it's odd that you're not seeing it on the single-node cluster you have (it sounds like).
0
 

Accepted Solution

by:
Propay earned 0 total points
ID: 38823664
Thanks for the reply.  This actually was caused by a change made over a year ago (no joke) to LAN Manager settings where we allowed the cluster to send NTLMv1 responses but required NTLMv2 everywhere else.  After restarting the SITE-1 DCs, it reared its ugly head as an issue (never seen before).  Once we set the requirement of NTLMv2 on the cluster nodes, like magic it all started working again.  No more locked accounts.  Because you replied, you get the points.
0
 
LVL 20

Expert Comment

by:Marten Rune
ID: 38823709
Two questions and one idea.
Q1: what does the errorlog of the instance say, specifically regarding SPN registration?
Q2: do you use ipv6, or only ipv4?
Idea: what happens if you log on using a sql account, i e SA or eqivalent?

Regards Marten
0
 
LVL 20

Expert Comment

by:Marten Rune
ID: 38823711
Sorry used my smartphone dis not see replies

Marten
0
 

Author Closing Comment

by:Propay
ID: 38838908
My comment was a reposting of the solution I found.
0

Featured Post

Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Forest and doamin tree 3 26
How does ADMT SID History work? 1 24
Repadmin - Active Directly synchronization 13 18
SQL Recursion 6 18
A procedure for exporting installed hotfix details of remote computers using powershell
Last week, our Skyport webinar on “How to secure your Active Directory” (https://www.experts-exchange.com/videos/5810/Webinar-Is-Your-Active-Directory-as-Secure-as-You-Think.html?cid=Gene_Skyport) provided 218 attendees with a step-by-step guide for…
This tutorial will walk an individual through locating and launching the BEUtility application and how to execute it on the appropriate database. Log onto the server running the Backup Exec database. In a larger environment, this would generally be …
This tutorial will walk an individual through configuring a drive on a Windows Server 2008 to perform shadow copies in order to quickly recover deleted files and folders. Click on Start and then select Computer to view the available drives on the se…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question