AD authentication timeout

Hi, in relation to another post i have  (a UTM box not authenticating to AD)
There techs have sent me logs showing that AD is taking too long to respond (25+ seconds)

same logs from a test lab take less than 1 second.

any ideas on how to speed up? or what may be causing the slow down.
I have posted a little of log  not sure if that helps or not.

it fails on a number of DCs completly (within 1 sec) then start to work but takes too long
14:18:54, 10libsmb/namequery.c:saf_store(71)
saf_store: domain = [interquad], server = [10.20.20.101], expire

libads/ldap.c:ads_try_connect (127)
ads_try_connect: sending CLDAP reques to 10.20.20.101, expire = [1202394834] [2008/02/07 14:19:19, 3]
libads/kerberos_keytab.c:smb_krb5_kt_add_entry(184)
ads_keytab_add_entry: adding keytab entry for blah blah with encryption type (3)  and Version (33)
14:19;19, 3]


and there closing text was we reccomend that the administrator of this domain dive in to AD set up and find source of the delay is. im the administrator and havent a clue   please help.
mhamerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

stronglineCommented:
what type of app generated that log? looks like a 3rd party software tried to use kerberos for authenication. This alone is not sufficient to prove that AD is in trboule.

did you have any other difficulties in terms of logging into the domain, accessing domain resources, etc?
did you check your DC event viewer? was there any error/warnings?
how about dcdiag output? Make sure your DNS is configured properly.
0
mhamerAuthor Commented:
dcdiag  looks fine

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests
   
   Testing server: locale\DC1
      Starting test: Connectivity
         ......................... DC1 passed test Connectivity

Doing primary tests
   
   Testing server: Slough\DC1
      Starting test: Replications
         ......................... DC1 passed test Replications
      Starting test: NCSecDesc
         ......................... DC1 passed test NCSecDesc
      Starting test: NetLogons
         ......................... DC1 passed test NetLogons
      Starting test: Advertising
         ......................... DC1 passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... DC1 passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... DC1 passed test RidManager
      Starting test: MachineAccount
         ......................... DC1 passed test MachineAccount
      Starting test: Services
         ......................... DC1 passed test Services
      Starting test: ObjectsReplicated
         ......................... DC1 passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... DC1 passed test frssysvol
      Starting test: frsevent
         ......................... DC1 passed test frsevent
      Starting test: kccevent
         ......................... DC1 passed test kccevent
      Starting test: systemlog
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:39:01
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:47:03
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:54:23
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:54:28
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:54:40
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:55:31
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   16:57:25
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:03:04
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:03:12
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:06:40
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:07:43
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:10:43
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:11:12
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:13:32
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:16:25
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:22:56
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:24:50
            (Event String could not be retrieved)
         An Error Event occured.  EventID: 0xC0001811
            Time Generated: 02/07/2008   17:29:53
            (Event String could not be retrieved)
         ......................... DC1 failed test systemlog
      Starting test: VerifyReferences
         ......................... DC1 passed test VerifyReferences
   
   Running partition tests on : ForestDnsZones
      Starting test: CrossRefValidation
         ......................... ForestDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... ForestDnsZones passed test CheckSDRefDom
   
   Running partition tests on : DomainDnsZones
      Starting test: CrossRefValidation
         ......................... DomainDnsZones passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... DomainDnsZones passed test CheckSDRefDom
   
   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom
   
   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom
   
   Running partition tests on : domainone
      Starting test: CrossRefValidation
         ......................... domainone passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... domainone passed test CheckSDRefDom
   
   Running enterprise tests on : domainone.com
      Starting test: Intersite
         ......................... domainone.com passed test Intersite
      Starting test: FsmoCheck
         ......................... domainone.com passed test FsmoCheck


it is a third party  astaro  firewall web filter UTM.

astaro say its us we say its them type of thing.
0
mhamerAuthor Commented:
as far as i know we dont have any issues with logons  (600 users)  to the domain  but so far authentication ofthe astro hasn't worked.
0
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

stronglineCommented:
is this the only DC? if so you can just show them the results and push back. I don't wee why when astro is the only program suffers and they still point to your DC.

How is astro configured? Can it find DC and talk with DC properly? These are only questions that astro can answer.
0
mhamerAuthor Commented:
we have about 30 dc's

we donthave any other "non" windows trying to authenticate.

astaro asks for a bind   cn= etc    which was cut andpasted from dsquery  

it does actually sucseed it just taked 30 seconds  which is impractical

I am of course still pushingastaro but  chances are it is a settin on the DC as  they have not seen this issue before nor has the community around astaro.

Plus same astaro box works fine in test lab with a diffrent domain
0
stronglineCommented:
Ok the delay occurred between function "saf_store" and "try_connect", there should be a lot more other actions in between such as get_dc_list. Find out which was the most delay, then do the same operation from a command line against the same DC.

For example, in above log astaro used 10.20.20.101, it could bind to either 389 or 3268 - I don't know anything about astaro - to make a LDAP connection, so you can open ldp.exe to bind to same DC to see how this DC responses to your own connect/bind requests.

To test how fast you DC give you a dclist, run "nltest /dclist:yourDomainName"
0
mhamerAuthor Commented:
the log was sent by fax, and I typed in a portion there are get dc options prior ill post hole thing shortly

apologies if that confused thigs
0
mhamerAuthor Commented:
this is the full log in dc get  all the addrees pre 10.20.20.101 are at remote sites and all appear to be working not sure what the FAILED is about.  the box itself is just pointed at 10.20.20.100  101 is a second server at same sight also a DC (all are DNS and GC at locations around the country)

our debugging has finished now.
Below you can see that the answer from the domain controller takes much too long.
We started the test at 14:18:52 and 27 Seconds later at 14:19:19 we received the answer.

[2008/02/07 14:18:52, 5] lib/debug.c:debug_dump_status(391)
[2008/02/07 14:18:52, 8] libsmb/namequery.c:get_dc_list(1441)
  Adding 13 DC's from auto lookup
[2008/02/07 14:18:52, 10]
libsmb/namequery.c:remove_duplicate_addrs2(408)
  remove_duplicate_addrs2: looking for duplicate address/port pairs
[2008/02/07 14:18:52, 4] libsmb/namequery.c:get_dc_list(1529)
  get_dc_list: returning 13 ip addresses in an ordered list
[2008/02/07 14:18:52, 4] libsmb/namequery.c:get_dc_list(1530)
  get_dc_list: 10.41.20.100:389 10.90.20.100:389 10.55.20.100:389
10.20.20.101:389 10.52.20.100:389 10.20.20.100:389 172.17.7.146:389
10.75.20.100:389 10.25.20.100:389 10.43.20.100:389 10.31.20.100:389
10.60.20.100:389 10.32.20.100:389
[2008/02/07 14:18:52, 5] libads/ldap.c:ads_try_connect(127)
  ads_try_connect: sending CLDAP request to 10.41.20.100 (realm:
domainname.COM)
[2008/02/07 14:18:52, 1] libads/cldap.c:recv_cldap_netlogon(215)
  no reply received to cldap netlogon
[2008/02/07 14:18:52, 3] libads/ldap.c:ads_try_connect(136)
  ads_try_connect: CLDAP request 10.41.20.100 failed.
[2008/02/07 14:18:52, 10]
libsmb/conncache.c:add_failed_connection_entry(140)
  add_failed_connection_entry: added domain domainname.COM (10.41.20.100) to failed conn cache
[2008/02/07 14:18:52, 5] libads/ldap.c:ads_try_connect(127)
  ads_try_connect: sending CLDAP request to 10.90.20.100 (realm:
domainname.COM)
[2008/02/07 14:18:54, 1] libads/cldap.c:recv_cldap_netlogon(215)
  no reply received to cldap netlogon
[2008/02/07 14:18:54, 3] libads/ldap.c:ads_try_connect(136)
  ads_try_connect: CLDAP request 10.90.20.100 failed.
[2008/02/07 14:18:54, 10]
libsmb/conncache.c:add_failed_connection_entry(140)
  add_failed_connection_entry: added domain domainname.COM (10.90.20.100) to failed conn cache
[2008/02/07 14:18:54, 5] libads/ldap.c:ads_try_connect(127)
  ads_try_connect: sending CLDAP request to 10.55.20.100 (realm:
domainname.COM)
[2008/02/07 14:18:54, 1] libads/cldap.c:recv_cldap_netlogon(215)
  no reply received to cldap netlogon
[2008/02/07 14:18:54, 3] libads/ldap.c:ads_try_connect(136)
  ads_try_connect: CLDAP request 10.55.20.100 failed.
[2008/02/07 14:18:54, 10]
libsmb/conncache.c:add_failed_connection_entry(140)
  add_failed_connection_entry: added domain domainname.COM (10.55.20.100) to failed conn cache
[2008/02/07 14:18:54, 5] libads/ldap.c:ads_try_connect(127)
  ads_try_connect: sending CLDAP request to 10.20.20.101 (realm:
domainname.COM)
[2008/02/07 14:18:54, 10] libsmb/namequery.c:saf_store(71)
  saf_store: domain = [domainname], server = [10.20.20.101], expire = [1202394834]
[2008/02/07 14:19:19, 3]
libads/kerberos_keytab.c:smb_krb5_kt_add_entry(184)
  ads_keytab_add_entry: adding keytab entry for
(HTTP/ASL-ASG@domainname.COM) with encryption type (3) and version (33)
[2008/02/07 14:19:19, 3]
libads/kerberos_keytab.c:smb_krb5_kt_add_entry(184)
  ads_keytab_add_entry: adding keytab entry for
(HTTP/ASL-ASG@domainname.COM) with encryption type (23) and version (33) Joined 'ASL-ASG' to realm 'domainname.COM'
[2008/02/07 14:19:19, 2] utils/net.c:main(988)
  return code = 0


Thats why the authentication can't work at all, since every request will
timeout before we receive the reply from the Domain Controller.

We recommend that the Windows-Domain-Admin dive into his AD-Setup and
check where here is the source of the delay.
0
mhamerAuthor Commented:
.
0
mhamerAuthor Commented:
hi the nltest /dclist:yourDomainName"  comes back instantly (less than a second) with complete and correct dc list
0
stronglineCommented:
[2008/02/07 14:18:54, 10] libsmb/namequery.c:saf_store(71)
  saf_store: domain = [domainname], server = [10.20.20.101], expire = [1202394834]
[2008/02/07 14:19:19, 3]
libads/kerberos_keytab.c:smb_krb5_kt_add_entry(184)

As you can see in above, even though the connection attempt to 10.20.20.100 failed, it didn't take much time for astora switched to 10.20.20.101. The delay occurred after the switch between the samba calls "saf_store" and "smb_krb5_kt_add_entry". I don't know much about samba API, so I really like to hear from astora about what "saf_store" does, in AD/Kerberos terms.

I still think your AD is healthy, the problem lies in somewhere inside astora.
Was the DCdiag taken from 100 or 101? Was there any kerberos event errors/warnings on 101 when astora connected to it (@2008/02/07 14:18:54, 10)?
0
mhamerAuthor Commented:
thanks for commenst also noted krb5krb_err_response_too_big in a net capture.

passed your comments on to astaro "see" if they respond :-)

dc diag from 100

wasnt checking

weekend approaching i can test a lot more on live

setting them to use TCPn for Kerbros only
connecting astaro to only the dc not the network
rebuild astaro etc
0
mhamerAuthor Commented:
also 10.20.20.100 and 10.20.20.101 are only dc that are not R2

there just last to be done, not sur eif that is having huge impact on this though  (have tried with the others)
0
stronglineCommented:
To back youself up better, it's desirable to have a DCdiag from 101 too.

Considering the fact that function "saf_store" is part of namequery.c and the delay, I believe there is DNS resolution issue. The call was timed out.

The problem is that, without knowing what name astaro was trying to resolve, and how it was configured to use DNS, I couldn't get any further.

Ask astaro to test name resolution on their box. Can they resolve the your domain name/DC name properly in a timely manner?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
mhamerAuthor Commented:
hi, there are a number of diag tool on the astaro including ping and tracerout


pinging     dc.domain.com   and dc  al work and ina decent time frame


dcdiag on 101 has a     bind error 5 (dont think is related this dc has a 16 charecter name)  trieda  ll other 22 dcs and it works fine on thoses
0
stronglineCommented:
Ok I found the source code of saf_store and like to make some changes to my last post:

1. the fact that saf_store was the last one in the log right before the delay doesn't neccessary mean the delay happed within saf_store. It might be some other calls after it but didn't generate any kind of debug info.
2. saf_store's task is just to generate an entry in a cahce file (based on the domain name and server name provided). *IF* the delay did happen within saf_store function, astaro's app must have had difficulties to access that cache file.
3. Saf_store itself didn't involve any name resolution.

Again, the log can't prove anything alone. Astaro has to provide more details such as what was done between [2008/02/07 14:18:54, 10] and [2008/02/07 14:19:19, 3], and then we can go from there.
 
0
stronglineCommented:
can you post the exact msg of "bind error 5"? Anything with kerberos might be related.
did ping domainNameOnly work?
0
mhamerAuthor Commented:
pinging domain

works but becuase i added a static

 Domain Controller Diagnosis

Performing initial setup:
   [domian controller 101] Directory Binding Error 5:
   Win32 Error 5
   This may limit some of the tests that can be performed.
   Done gathering initial info.

Doing initial required tests
0
stronglineCommented:
your DNS should resolve domainName to one of DCs' IPs.
i know I asked before, but did you check your event viewer (system and application logs)?
0
mhamerAuthor Commented:
my dns as in an xp machine or a dc does resolve domain to 10.20.20.100
the astaro  has 10.20.20.100 as a forwarder  but only resloves the domain when i add a static.

logs i checked a few  10.20.20.100  have no mention i think sme sort of round robin is occuring  knowing which server took the request is hard, but i am checking them all of course
cheers

dentist tomorrow will stay at home and trawl logs with fine tooth comb

0
mhamerAuthor Commented:
Hi, although not a solution, you did help a great deal thank you.
ASTARO provided a work around by changing there hardware to hardcoded a coupel of DC's  rath erthan DC get  not great but will do for the test period
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Active Directory

From novice to tech pro — start learning today.