Link to home
Start Free TrialLog in
Avatar of zephyr_hex (Megan)
zephyr_hex (Megan)Flag for United States of America

asked on

Server randomly loses contact with domain

we have 2 problems and i think they are symptoms of the same underlying problem because both seem to happen around the same time as each other.  occurrences are random.  both problems happen on a windows server 2003 running Terminal Services and hosting a sharepoint wss 3.0 web front end.  all servers in the network are up to date on patches.

problem1: new RDP sessions on the server are denied with a message that the RPC server is unavailable.  current RDP sessions are not disconnected.
event log:
Windows cannot determine the user or computer name. (The RPC server is unavailable. ). Group Policy processing aborted.

problem2:  sharepoint dies.  Event log shows:
SQL database login failed. Additional error information from SQL Server is included below.

Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'.

sharepoint databases are hosted on another server running sql server 2005.

we initially thought this was a problem with a switch because on several occasions, the problem was resolved by power cycling the switch.  we replaced the switch yesterday and the problem happened again today.

i don't think the issue is sourced in sharepoint because of the coinciding issue with RDP on that particular server.  i think, for some reason, there is a loss of connectivity with the domain on this 1 server.

the domain controller is on another server running Windows Server 2000.  that server is not losing network connection -- we run a database app off that server and it does not have any issues.  if the DC were failing in some way or had some kind of network issue, then the database app would lose connection.  there are no errors in the DC Event log.  moreover, i am able to RDP to other servers on the domain when the problem is happening with the server running Terminal Services, which tells me the domain itself is healthy when the problem is happening.

any ideas on what is going on?
Avatar of SysExpert
SysExpert
Flag of Israel image

1) Check that the RPC and related services and running properly.  Also check licenses and max. users

2) CHeck the log if DCDIAG and use the admin tools to also check DNS.

It sounds like an authentication issue partially, but it could be a bad TCPIP stack somewhere or a bunch of other network related issues.


I hope this helps !
Avatar of zephyr_hex (Megan)

ASKER

on point #1, how do i check RPC.  as per licensing, is that CAL licensing for the OS or some other kind?

#2: dcdiag results:

Z:\>dcdiag /s:qmvandbserver

Domain Controller Diagnosis

Performing initial setup:
   Done gathering initial info.

Doing initial required tests

   Testing server: Default-First-Site-Name\QMVANDBSERVER
      Starting test: Connectivity
         ......................... QMVANDBSERVER passed test Connectivity

Doing primary tests

   Testing server: Default-First-Site-Name\QMVANDBSERVER
      Starting test: Replications
         [Replications Check,QMVANDBSERVER] A recent replication attempt failed:

            From CITRIX to QMVANDBSERVER
            Naming Context: CN=Schema,CN=Configuration,DC=domain
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-12-06 07:49:11.
            The last success occurred at 2007-08-29 13:49:36.
            2410 failures have occurred since the last success.
            [CITRIX] DsBindWithSpnEx() failed with error 1722,
            The RPC server is unavailable..
            The source remains down. Please check the machine.
         [Replications Check,QMVANDBSERVER] A recent replication attempt failed:

            From CITRIX to QMVANDBSERVER
            Naming Context: CN=Configuration,DC=domain
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-12-06 07:48:48.
            The last success occurred at 2007-08-29 14:31:29.
            7968 failures have occurred since the last success.
            The source remains down. Please check the machine.
         [Replications Check,QMVANDBSERVER] A recent replication attempt failed:

            From CITRIX to QMVANDBSERVER
            Naming Context: DC=domain
            The replication generated an error (1722):
            The RPC server is unavailable.
            The failure occurred at 2007-12-06 07:48:25.
            The last success occurred at 2007-08-29 14:26:30.
            2928 failures have occurred since the last success.
            The source remains down. Please check the machine.
         ......................... QMVANDBSERVER passed test Replications
      Starting test: NCSecDesc
         ......................... QMVANDBSERVER passed test NCSecDesc
      Starting test: NetLogons
         ......................... QMVANDBSERVER passed test NetLogons
      Starting test: Advertising
         ......................... QMVANDBSERVER passed test Advertising
      Starting test: KnowsOfRoleHolders
         ......................... QMVANDBSERVER passed test KnowsOfRoleHolders
      Starting test: RidManager
         ......................... QMVANDBSERVER passed test RidManager
      Starting test: MachineAccount
         ......................... QMVANDBSERVER passed test MachineAccount
      Starting test: Services
         ......................... QMVANDBSERVER passed test Services
      Starting test: ObjectsReplicated
         ......................... QMVANDBSERVER passed test ObjectsReplicated
      Starting test: frssysvol
         ......................... QMVANDBSERVER passed test frssysvol
      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... QMVANDBSERVER failed test frsevent
      Starting test: kccevent
         ......................... QMVANDBSERVER passed test kccevent
      Starting test: systemlog
         ......................... QMVANDBSERVER passed test systemlog
      Starting test: VerifyReferences
         ......................... QMVANDBSERVER passed test VerifyReferences

   Running partition tests on : Schema
      Starting test: CrossRefValidation
         ......................... Schema passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Schema passed test CheckSDRefDom

   Running partition tests on : Configuration
      Starting test: CrossRefValidation
         ......................... Configuration passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... Configuration passed test CheckSDRefDom

   Running partition tests on : domain
      Starting test: CrossRefValidation
         ......................... domain passed test CrossRefValidation
      Starting test: CheckSDRefDom
         ......................... domain passed test CheckSDRefDom

   Running enterprise tests on : domain
      Starting test: Intersite
         ......................... domain passed test Intersite
      Starting test: FsmoCheck
         ......................... domain passed test FsmoCheck
*********************************************

so, reviewing the errors, i have a few comments
CITRIX is a server that is currently out of commission.  it is a secondary DC.  the errors that pertain to CITRIX mention the RPC server being unavailable.  does this just mean that the primary DC (QMVANDBSERVER) is unable to talk to CITRIX?

the other error:
"      Starting test: frsevent
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems."

this isn't very specific... but the replication is failing because the other DC is down... could this be causing our problem and if so, why?  or is this totally unrelated?

how would i track down a back TCPIP stack, and is hardware the cause of TCPIP stack failures?
another question...
if this issue is caused a problem on the DC or the fact that the secondary DC is offline, would we not be having problems on our other computers and servers on the domain?

however, our problems seem to be all related to one server that is not a DC.  when the problem happens, we can't have new RDP sessions, and sharepoint web front end has connection issues to the backend DB.  also, yesterday i had an RDP session established before the problem happened... and as i was troubleshooting the problem, i eventually had an error when i tried to look at Properties of a site in IIS ... the error pertaining to losing connection and asking me if i wanted to reestablish the connection...

i don't think the RDP issue is licensed related.  It says the problem is that it can't contact the RPC server.  in the past i've seen that as a generic error... basically stating that the computer can't communicate to a network resource (for example, when setting up a domain trust, if there is a firewall between the two DC's, you get an error about not being able to contact the RPC server).
based on symptoms described here:
http://www.mcse.ms/message1580926.html

i have removed the external DNS from this server so the only DNS entry points to the DC.

will have to wait and see if the problem re-occurs since there doesn't seem to be any pattern as to when it happens.  it was happening several times a week, so if it does not re-occur in a week's time, i am fairly sure it can be declared resolved.

so now it's a wait and see...
ASKER CERTIFIED SOLUTION
Avatar of SysExpert
SysExpert
Flag of Israel image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ok... removing the external DNS seems to have resolved the problem.  we have not had the issue since i removed it, and it was happening every few days, and sometimes several times a day.

that said, i am awarding points to ChiefIT because he has brought up some relevant issues that i need to address.

also, SysExpert's comments about the DC sync are helpful.