Avatar of cwalter77
Flag for United States of America asked on

Kerberos Failing on Workstations on Domain

Hello, and thanks in advance for any help you can provide me.

We have a Domain with Active Directory.  There are about 3,000 client PC's using Windows XP Pro and about 5-7 Domain Controllers Using Windows Server 2003 .  We are encountering the following problem with only a handful of computers right now, but the issue seems to be growing slowly.  Some users in our field offices connect to a Sharepoint website to view reports that are specific to their area.  Sharepoint decides what view they see by their logon credentials.  When accessing this site on PC #1 - User #1 is unable to view the report - it is basically not giving them any options to select for specific weeks, etc.  The page itself does come up, but the options to make selections on that page do not.  
Now if this same user #1 goes to PC #2 - they can get to the page and view the report without any issue.  So this tells me the issue could be with the PC itself.  
User #2 logs into PC#1 - goes to the site, and has no issues...

We ran NetDiag on the PC while it was logged in under these two different user ID's.  The resulting log files were identical except for the following two tests:
We then ran a NetDiag on PC #1 while logged in under User #1 ID and get the following:
Trust relationship test. . . . . . : Passed
    Secure channel for domain '(DOMAIN NAME)' is to '(DOMAIN CONTROLLER #1.DOMAIN.COM)'.
Kerberos test. . . . . . . . . . . : Failed
    [FATAL] Cannot get ticket cache from Kerberos.
    The error occurred was: (null)

We then log into same PC #1 with user #2 and run NetDiag on the PC:

Trust relationship test. . . . . . : Passed
    Secure channel for domain '(DOMAIN NAME)' is to '(DOMAIN CONTROLLER #2.DOMAIN.COM)'.
Kerberos test. . . . . . . . . . . : Passed

So they are authenticating the trust relationship with two different Controllers (domain controller #1 and #2), and User #1 is getting Kerberos Failures.

Most of the time re-adding the PC to the Domain fixes the issue - but now I am getting more calls with this issue and even have a PC that has repeat behaivior after re-adding to the domain about a month ago.  To make things even more interesting, it is only associated with this particular web site in our Sharepoint Environment...  all other internal websites and/or applications are working fine when the user needs to authenticate  (at least as far as I can tell)

A few questions:  
1.  What is causing this kerberos failure for one user, but not another on the same PC?
2.  Why is it only linked to that specific site?
3.  Is there any other test that I can do to help troubleshoot this issue?

so far we tried installing the Windows Resource Tools(kerbtray) to see if kerberos is running but no luck, it doesnt list any tickets. in some PCs kerberos works with one user but not on another one. when rejoining the PC to the domain, it fixes the problem.
but we found other PCs that when kerberos doesnt work on anybody... rejoining to the domain does not fix the issue.
how do we fix the kerberos issue? what causes kerberos to stop working on some users?
Thanks again for any information!  we appreciate your help
Microsoft Legacy OSWindows XPActive Directory

Avatar of undefined
Last Comment

8/22/2022 - Mon

I wanted to re-open this question, because we have been working with Microsoft on this issue.  I had Posted this question twice here at Experts Exchange, and had not had any luck with a resolution.  I do appreciate everyone's help, and wanted to offer the informaiton we found from Microsoft in case any other users out there ever come across this problem.

This is Microsoft's suggestion.

1. Change the registry key KEY_LOCAL_MACHINE\SOFTWARE\Microsoft\ Windows NT\CurrentVersion\Winlogon\cachedlogonscount from 10 to 1
2. Reboot and log in as the user

Some background on this setting:

Determines how many user account entries Windows saves in the logon cache on the local computer. Windows saves the user account data that is used to log on to the computer so the data can be used if the user's domain controller is unavailable.
If the value of this entry is 0, Windows does not save any user account data in the logon cache. In that case, if the user's domain controller is not available and a user tries to log on to a computer that does not have the user's account information, Windows displays the following message:
"The system cannot log you on now because the domain <Domain-name> is not available."

I will post whether or not this fully fixed our issue.

Hi, it's me again.
Have you looked into if it is a duplicate SPN issue:
There is a related post here:

I should mention, I find Method 2 in the M$ article to be the easiest.
Your help has saved me hundreds of hours of internet surfing.

We will check it out and post the results.  Thanks!

Changing the setting in Group Policy for cachedlogons from 10 to 1 works on the Desktop workstations.  We are not implementing this on laptops though for the obvious reasons.

Our server team is still researching the servers as suggested by Pber above.

Unfortunately over the past few days, we are starting to get repeat offenders even with the registry key changed.  I guess we are back to the drawing board....
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.

I couldn't figure out why M$ suggest that setting, it didn't seem like it would work on your problem.  Sorry it didn't solve it though.
Any success on the SPN front?
I'm not sure if this is related, but I'll mention it as we've had Kerberos issues in certain scenarios.  By default, when a user logs on they it will attempt a Kerberos negotiation using the UDP protocol first.   The problem with UDP is if the packets fragement and get received in the wrong order, the Kerberos negotiation fail.  It drops down to NTLM authentication and sometimes logons take a long time because of retries.
We've seen this with specific users with large number of group memberships resulting a tokens being larger than the default MTU size of 1500 bytes.  This might explain some of your results that some users work and other don't on the same machine.  We've also seen this with users authenticating across firewalls.  
We fixed our issues like this by forcing Kerberos to use TCP instead of UDP.  This M$ article goes into some background and how to change it.
The nice thing, is you can change this setting on a few machines and test.  We ultimately went the GPO path after we proved our results.  I believe in Vista and 2008, they implemented a newer RFC spec for Kerberos which already forces Kerberos to use TCP.

View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.