Good Morning,
I have an interesting problem with my exchange server, but first, a few details about my setup.
I have 3 servers I am examining:
1 - Win2k3 R2 Domain Controller (service pack 2 applied) - This is my primary domain controller, it has all FSMO roles and is a GC.
2 - Win2k3 R2 Domain Controller (sp2 as well) - This is my backup Domain Controller. It is also a GC.
3 - Win2k3 R2 Member Server (sp2) running Exchange 2003 (sp2). This is my exchange server, the one with the errors.
Now then, here are my issues:
1. - Randomly, and not every day, not even once a week, but randomly say...every 10 days or so, I get the Event ID: 8026 error that states "LDAP Bind was unsuccessful on directory" and then lists the server name and states that it is down.
2. - After several minutes of the previous error, I then get 1 or 2 Event ID:2102 "Process MAD.EXE (PID=3052). All Domain Controller Servers in use are not responding: " and then it lists my two domain controllers.
3 - This one is new, and it happened during an attempt to reboot my domain controllers, so I don't know how relevant it is, but I will include it. Event ID:9074 "The Directory Service Referral interface failed to service a client request. RFRI is returning the error code:[0x3f0]. "
Those 3 error messages, starting with the popular LDAP error, are causing my clients to get DC'd from the exchange server. If I do nothing, in about 20-40 minutes, everything will have corrected itself.
Corrective actions thus far include:
1. Running NTDSutil to verify that no bad metadata exists in AD, that would confuse Exchange as to which GC it would use. I found nothing that was a dead link or bad data, just the two domain controllers.
2. Running LDP.Exe and binding to the Domain Controller(s) via port 389 to ensure LDAP connectivity and tree status.
3. Using Spotlight on AD to keep track of active LDAP sessions on my DC, to maybe see if I was crashing LDAP with too many requests, but that's not it either. It never gets above 47 sessions.
4. Verified all subnets are correct in AD Sites and Services as suggested in another thread.
5. Verified all NTDS settings were correct and that no replication errors have been encountered.
Also - I have done many searches on this subject. I have read the KB articles on DCPromo and they do not apply to my situation. I have tried every solution I have found, and am still having these issues.
Personally, I am starting to lean towards a hardware issue with the switches, but if that's the case, why can I run DCDiag while the error is happening and everything checks out ok.....very odd.
Any help or suggestions outside of what I've already done would be EMMENSELY appreciated.
Thanks in advance!
-Masterlubu
Start Free Trial