Solved

Server randomly loses contact with domain

Posted on 2007-12-05
9
409 Views
Last Modified: 2011-10-03
we have 2 problems and i think they are symptoms of the same underlying problem because both seem to happen around the same time as each other.  occurrences are random.  both problems happen on a windows server 2003 running Terminal Services and hosting a sharepoint wss 3.0 web front end.  all servers in the network are up to date on patches.

problem1: new RDP sessions on the server are denied with a message that the RPC server is unavailable.  current RDP sessions are not disconnected.
event log:
Windows cannot determine the user or computer name. (The RPC server is unavailable. ). Group Policy processing aborted.

problem2:  sharepoint dies.  Event log shows:
SQL database login failed. Additional error information from SQL Server is included below.

Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'.

sharepoint databases are hosted on another server running sql server 2005.

we initially thought this was a problem with a switch because on several occasions, the problem was resolved by power cycling the switch.  we replaced the switch yesterday and the problem happened again today.

i don't think the issue is sourced in sharepoint because of the coinciding issue with RDP on that particular server.  i think, for some reason, there is a loss of connectivity with the domain on this 1 server.

the domain controller is on another server running Windows Server 2000.  that server is not losing network connection -- we run a database app off that server and it does not have any issues.  if the DC were failing in some way or had some kind of network issue, then the database app would lose connection.  there are no errors in the DC Event log.  moreover, i am able to RDP to other servers on the domain when the problem is happening with the server running Terminal Services, which tells me the domain itself is healthy when the problem is happening.

any ideas on what is going on?
0
Comment
Question by:zephyr_hex
  • 5
  • 3
9 Comments
 
LVL 63

Expert Comment

by:SysExpert
ID: 20417218
1) Check that the RPC and related services and running properly.  Also check licenses and max. users

2) CHeck the log if DCDIAG and use the admin tools to also check DNS.

It sounds like an authentication issue partially, but it could be a bad TCPIP stack somewhere or a bunch of other network related issues.


I hope this helps !
0
 
LVL 42

Author Comment

by:zephyr_hex
ID: 20421063
on point #1, how do i check RPC.  as per licensing, is that CAL licensing for the OS or some other kind?

#2: dcdiag results:

Z:\>dcdiag /s:qmvandbserver

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\QMVANDBSERVER
      Starting test: Connectivity
         ......................... QMVANDBSERVER passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\QMVANDBSERVER
      Starting test: Replications
         [Replications Check,QMVANDBSERVER] A recent replication attempt failed:

            From CITRIX to QMVANDBSERVER
            Naming Context: CN=Schema,CN=Configuration,DC=domain
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-12-06 07:49:11.
            The last success occurred at 2007-08-29 13:49:36.
            2410 failures have occurred since the last success.
            [CITRIX] DsBindWithSpnEx() failed with error 1722,
            The RPC server is unavailable..
            The source remains down. Please check the machine.
         [Replications Check,QMVANDBSERVER] A recent replication attempt failed:

            From CITRIX to QMVANDBSERVER
            Naming Context: CN=Configuration,DC=domain
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-12-06 07:48:48.
            The last success occurred at 2007-08-29 14:31:29.
            7968 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,QMVANDBSERVER] A recent replication attempt failed:

            From CITRIX to QMVANDBSERVER
            Naming Context: DC=domain
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-12-06 07:48:25.
            The last success occurred at 2007-08-29 14:26:30.
            2928 failures have occurred since the last success.
            The source remains down. Please check the machine.
         ......................... QMVANDBSERVER passed test Replications
      Starting test: NCSecDesc
         ......................... QMVANDBSERVER passed test NCSecDesc
      Starting test: NetLogons
         ......................... QMVANDBSERVER passed test NetLogons
      Starting test: Advertising
         ......................... QMVANDBSERVER passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... QMVANDBSERVER passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... QMVANDBSERVER passed test RidManager
      Starting test: MachineAccount
         ......................... QMVANDBSERVER passed test MachineAccount
      Starting test: Services
         ......................... QMVANDBSERVER passed test Services
      Starting test: ObjectsReplicated
         ......................... QMVANDBSERVER passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... QMVANDBSERVER passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... QMVANDBSERVER failed test frsevent
      Starting test: kccevent
         ......................... QMVANDBSERVER passed test kccevent
      Starting test: systemlog
         ......................... QMVANDBSERVER passed test systemlog
      Starting test: VerifyReferences
         ......................... QMVANDBSERVER passed test VerifyReferences

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : domain
      Starting test: CrossRefValidation
         ......................... domain passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... domain passed test CheckSDRefDom

   Running enterprise tests on : domain
      Starting test: Intersite
         ......................... domain passed test Intersite
      Starting test: FsmoCheck
         ......................... domain passed test FsmoCheck
*********************************************

so, reviewing the errors, i have a few comments
CITRIX is a server that is currently out of commission.  it is a secondary DC.  the errors that pertain to CITRIX mention the RPC server being unavailable.  does this just mean that the primary DC (QMVANDBSERVER) is unable to talk to CITRIX?

the other error:
"      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems."

this isn't very specific... but the replication is failing because the other DC is down... could this be causing our problem and if so, why?  or is this totally unrelated?

how would i track down a back TCPIP stack, and is hardware the cause of TCPIP stack failures?
0
 
LVL 42

Author Comment

by:zephyr_hex
ID: 20421819
another question...
if this issue is caused a problem on the DC or the fact that the secondary DC is offline, would we not be having problems on our other computers and servers on the domain?

however, our problems seem to be all related to one server that is not a DC.  when the problem happens, we can't have new RDP sessions, and sharepoint web front end has connection issues to the backend DB.  also, yesterday i had an RDP session established before the problem happened... and as i was troubleshooting the problem, i eventually had an error when i tried to look at Properties of a site in IIS ... the error pertaining to losing connection and asking me if i wanted to reestablish the connection...

i don't think the RDP issue is licensed related.  It says the problem is that it can't contact the RPC server.  in the past i've seen that as a generic error... basically stating that the computer can't communicate to a network resource (for example, when setting up a domain trust, if there is a firewall between the two DC's, you get an error about not being able to contact the RPC server).
0
 
LVL 42

Author Comment

by:zephyr_hex
ID: 20424122
based on symptoms described here:
http://www.mcse.ms/message1580926.html

i have removed the external DNS from this server so the only DNS entry points to the DC.

will have to wait and see if the problem re-occurs since there doesn't seem to be any pattern as to when it happens.  it was happening several times a week, so if it does not re-occur in a week's time, i am fairly sure it can be declared resolved.

so now it's a wait and see...
0
 
LVL 42

Author Comment

by:zephyr_hex
ID: 20424126
0
 
LVL 63

Accepted Solution

by:
SysExpert earned 250 total points
ID: 20425270
Well that certainly is a start.

Don't forget that if the DC servers do not replicate for 30 days, the synching issue becmes serious and needs to be resolved, possibly manually.


I hope this helps !
0
 
LVL 63

Expert Comment

by:SysExpert
ID: 20425350
0
 
LVL 38

Assisted Solution

by:ChiefIT
ChiefIT earned 250 total points
ID: 20426089
Your initial diagnosis of the switch may have been correct. It may not be a bad swith, just a dumb switch. Dumb switches need to be configured for spanning tree port fast. Spanning Tree Port Fast can cause intermittant connectivity issues. When you replaced one swith with the other, you may have replaced a dumb switch with another dumb switch.

Another thing that can cause intermittant comms is the mode of operation of switches and routers. On Cisco switches and routers, the mode has to be identical to one another to speak. You would think the mode of 100 Mb/S full duplex would talk with a router set for auto. They won't. It is just a quirk with Cisco. Can we get the make and model of your switch and router?

Another problem this could be is the router needs a manually configured list of LAN DNS servers. The router is the middle man for communications between the clients and server. From the client side, you could periodically be getting a DNS server that is not listen on the router.

The most likely cause of intermittant coms is Dual NICs on the server or spanning tree port fast on the switch.
0
 
LVL 42

Author Comment

by:zephyr_hex
ID: 20450434
ok... removing the external DNS seems to have resolved the problem.  we have not had the issue since i removed it, and it was happening every few days, and sometimes several times a day.

that said, i am awarding points to ChiefIT because he has brought up some relevant issues that i need to address.

also, SysExpert's comments about the DC sync are helpful.
0

Join & Write a Comment

SharePoint Designer 2010 has tools and commands to do everything that can be done with web parts in the browser, and then some – except uploading a web part straight into a page that is edited in SPD. So, can it be done? Scenario For a recent pr…
Is your Office 365 signature not working the way you want it to? Are signature updates taking up too much of your time? Let's run through the most common problems that an IT administrator can encounter when dealing with Office 365 email signatures.
This tutorial will walk an individual through the steps necessary to join and promote the first Windows Server 2012 domain controller into an Active Directory environment running on Windows Server 2008. Determine the location of the FSMO roles by lo…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles from a Windows Server 2008 domain controller to a Windows Server 2012 domain controlle…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now