zephyr_hex (Megan)
asked on
Server randomly loses contact with domain
we have 2 problems and i think they are symptoms of the same underlying problem because both seem to happen around the same time as each other. occurrences are random. both problems happen on a windows server 2003 running Terminal Services and hosting a sharepoint wss 3.0 web front end. all servers in the network are up to date on patches.
problem1: new RDP sessions on the server are denied with a message that the RPC server is unavailable. current RDP sessions are not disconnected.
event log:
Windows cannot determine the user or computer name. (The RPC server is unavailable. ). Group Policy processing aborted.
problem2: sharepoint dies. Event log shows:
SQL database login failed. Additional error information from SQL Server is included below.
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'.
sharepoint databases are hosted on another server running sql server 2005.
we initially thought this was a problem with a switch because on several occasions, the problem was resolved by power cycling the switch. we replaced the switch yesterday and the problem happened again today.
i don't think the issue is sourced in sharepoint because of the coinciding issue with RDP on that particular server. i think, for some reason, there is a loss of connectivity with the domain on this 1 server.
the domain controller is on another server running Windows Server 2000. that server is not losing network connection -- we run a database app off that server and it does not have any issues. if the DC were failing in some way or had some kind of network issue, then the database app would lose connection. there are no errors in the DC Event log. moreover, i am able to RDP to other servers on the domain when the problem is happening with the server running Terminal Services, which tells me the domain itself is healthy when the problem is happening.
any ideas on what is going on?
problem1: new RDP sessions on the server are denied with a message that the RPC server is unavailable. current RDP sessions are not disconnected.
event log:
Windows cannot determine the user or computer name. (The RPC server is unavailable. ). Group Policy processing aborted.
problem2: sharepoint dies. Event log shows:
SQL database login failed. Additional error information from SQL Server is included below.
Login failed for user 'NT AUTHORITY\ANONYMOUS LOGON'.
sharepoint databases are hosted on another server running sql server 2005.
we initially thought this was a problem with a switch because on several occasions, the problem was resolved by power cycling the switch. we replaced the switch yesterday and the problem happened again today.
i don't think the issue is sourced in sharepoint because of the coinciding issue with RDP on that particular server. i think, for some reason, there is a loss of connectivity with the domain on this 1 server.
the domain controller is on another server running Windows Server 2000. that server is not losing network connection -- we run a database app off that server and it does not have any issues. if the DC were failing in some way or had some kind of network issue, then the database app would lose connection. there are no errors in the DC Event log. moreover, i am able to RDP to other servers on the domain when the problem is happening with the server running Terminal Services, which tells me the domain itself is healthy when the problem is happening.
any ideas on what is going on?
ASKER
on point #1, how do i check RPC. as per licensing, is that CAL licensing for the OS or some other kind?
#2: dcdiag results:
Z:\>dcdiag /s:qmvandbserver
Domain Controller Diagnosis
Performing initial setup:
Done gathering initial info.
Doing initial required tests
Testing server: Default-First-Site-Name\QM VANDBSERVE R
Starting test: Connectivity
......................... QMVANDBSERVER passed test Connectivity
Doing primary tests
Testing server: Default-First-Site-Name\QM VANDBSERVE R
Starting test: Replications
[Replications Check,QMVANDBSERVER] A recent replication attempt failed:
From CITRIX to QMVANDBSERVER
Naming Context: CN=Schema,CN=Configuration ,DC=domain
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-12-06 07:49:11.
The last success occurred at 2007-08-29 13:49:36.
2410 failures have occurred since the last success.
[CITRIX] DsBindWithSpnEx() failed with error 1722,
The RPC server is unavailable..
The source remains down. Please check the machine.
[Replications Check,QMVANDBSERVER] A recent replication attempt failed:
From CITRIX to QMVANDBSERVER
Naming Context: CN=Configuration,DC=domain
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-12-06 07:48:48.
The last success occurred at 2007-08-29 14:31:29.
7968 failures have occurred since the last success.
The source remains down. Please check the machine.
[Replications Check,QMVANDBSERVER] A recent replication attempt failed:
From CITRIX to QMVANDBSERVER
Naming Context: DC=domain
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-12-06 07:48:25.
The last success occurred at 2007-08-29 14:26:30.
2928 failures have occurred since the last success.
The source remains down. Please check the machine.
......................... QMVANDBSERVER passed test Replications
Starting test: NCSecDesc
......................... QMVANDBSERVER passed test NCSecDesc
Starting test: NetLogons
......................... QMVANDBSERVER passed test NetLogons
Starting test: Advertising
......................... QMVANDBSERVER passed test Advertising
Starting test: KnowsOfRoleHolders
......................... QMVANDBSERVER passed test KnowsOfRoleHolders
Starting test: RidManager
......................... QMVANDBSERVER passed test RidManager
Starting test: MachineAccount
......................... QMVANDBSERVER passed test MachineAccount
Starting test: Services
......................... QMVANDBSERVER passed test Services
Starting test: ObjectsReplicated
......................... QMVANDBSERVER passed test ObjectsReplicated
Starting test: frssysvol
......................... QMVANDBSERVER passed test frssysvol
Starting test: frsevent
There are warning or error events within the last 24 hours after the
SYSVOL has been shared. Failing SYSVOL replication problems may cause
Group Policy problems.
......................... QMVANDBSERVER failed test frsevent
Starting test: kccevent
......................... QMVANDBSERVER passed test kccevent
Starting test: systemlog
......................... QMVANDBSERVER passed test systemlog
Starting test: VerifyReferences
......................... QMVANDBSERVER passed test VerifyReferences
Running partition tests on : Schema
Starting test: CrossRefValidation
......................... Schema passed test CrossRefValidation
Starting test: CheckSDRefDom
......................... Schema passed test CheckSDRefDom
Running partition tests on : Configuration
Starting test: CrossRefValidation
......................... Configuration passed test CrossRefValidation
Starting test: CheckSDRefDom
......................... Configuration passed test CheckSDRefDom
Running partition tests on : domain
Starting test: CrossRefValidation
......................... domain passed test CrossRefValidation
Starting test: CheckSDRefDom
......................... domain passed test CheckSDRefDom
Running enterprise tests on : domain
Starting test: Intersite
......................... domain passed test Intersite
Starting test: FsmoCheck
......................... domain passed test FsmoCheck
************************** ********** *********
so, reviewing the errors, i have a few comments
CITRIX is a server that is currently out of commission. it is a secondary DC. the errors that pertain to CITRIX mention the RPC server being unavailable. does this just mean that the primary DC (QMVANDBSERVER) is unable to talk to CITRIX?
the other error:
" Starting test: frsevent
There are warning or error events within the last 24 hours after the
SYSVOL has been shared. Failing SYSVOL replication problems may cause
Group Policy problems."
this isn't very specific... but the replication is failing because the other DC is down... could this be causing our problem and if so, why? or is this totally unrelated?
how would i track down a back TCPIP stack, and is hardware the cause of TCPIP stack failures?
#2: dcdiag results:
Z:\>dcdiag /s:qmvandbserver
Domain Controller Diagnosis
Performing initial setup:
Done gathering initial info.
Doing initial required tests
Testing server: Default-First-Site-Name\QM
Starting test: Connectivity
......................... QMVANDBSERVER passed test Connectivity
Doing primary tests
Testing server: Default-First-Site-Name\QM
Starting test: Replications
[Replications Check,QMVANDBSERVER] A recent replication attempt failed:
From CITRIX to QMVANDBSERVER
Naming Context: CN=Schema,CN=Configuration
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-12-06 07:49:11.
The last success occurred at 2007-08-29 13:49:36.
2410 failures have occurred since the last success.
[CITRIX] DsBindWithSpnEx() failed with error 1722,
The RPC server is unavailable..
The source remains down. Please check the machine.
[Replications Check,QMVANDBSERVER] A recent replication attempt failed:
From CITRIX to QMVANDBSERVER
Naming Context: CN=Configuration,DC=domain
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-12-06 07:48:48.
The last success occurred at 2007-08-29 14:31:29.
7968 failures have occurred since the last success.
The source remains down. Please check the machine.
[Replications Check,QMVANDBSERVER] A recent replication attempt failed:
From CITRIX to QMVANDBSERVER
Naming Context: DC=domain
The replication generated an error (1722):
The RPC server is unavailable.
The failure occurred at 2007-12-06 07:48:25.
The last success occurred at 2007-08-29 14:26:30.
2928 failures have occurred since the last success.
The source remains down. Please check the machine.
......................... QMVANDBSERVER passed test Replications
Starting test: NCSecDesc
......................... QMVANDBSERVER passed test NCSecDesc
Starting test: NetLogons
......................... QMVANDBSERVER passed test NetLogons
Starting test: Advertising
......................... QMVANDBSERVER passed test Advertising
Starting test: KnowsOfRoleHolders
......................... QMVANDBSERVER passed test KnowsOfRoleHolders
Starting test: RidManager
......................... QMVANDBSERVER passed test RidManager
Starting test: MachineAccount
......................... QMVANDBSERVER passed test MachineAccount
Starting test: Services
......................... QMVANDBSERVER passed test Services
Starting test: ObjectsReplicated
......................... QMVANDBSERVER passed test ObjectsReplicated
Starting test: frssysvol
......................... QMVANDBSERVER passed test frssysvol
Starting test: frsevent
There are warning or error events within the last 24 hours after the
SYSVOL has been shared. Failing SYSVOL replication problems may cause
Group Policy problems.
......................... QMVANDBSERVER failed test frsevent
Starting test: kccevent
......................... QMVANDBSERVER passed test kccevent
Starting test: systemlog
......................... QMVANDBSERVER passed test systemlog
Starting test: VerifyReferences
......................... QMVANDBSERVER passed test VerifyReferences
Running partition tests on : Schema
Starting test: CrossRefValidation
......................... Schema passed test CrossRefValidation
Starting test: CheckSDRefDom
......................... Schema passed test CheckSDRefDom
Running partition tests on : Configuration
Starting test: CrossRefValidation
......................... Configuration passed test CrossRefValidation
Starting test: CheckSDRefDom
......................... Configuration passed test CheckSDRefDom
Running partition tests on : domain
Starting test: CrossRefValidation
......................... domain passed test CrossRefValidation
Starting test: CheckSDRefDom
......................... domain passed test CheckSDRefDom
Running enterprise tests on : domain
Starting test: Intersite
......................... domain passed test Intersite
Starting test: FsmoCheck
......................... domain passed test FsmoCheck
**************************
so, reviewing the errors, i have a few comments
CITRIX is a server that is currently out of commission. it is a secondary DC. the errors that pertain to CITRIX mention the RPC server being unavailable. does this just mean that the primary DC (QMVANDBSERVER) is unable to talk to CITRIX?
the other error:
" Starting test: frsevent
There are warning or error events within the last 24 hours after the
SYSVOL has been shared. Failing SYSVOL replication problems may cause
Group Policy problems."
this isn't very specific... but the replication is failing because the other DC is down... could this be causing our problem and if so, why? or is this totally unrelated?
how would i track down a back TCPIP stack, and is hardware the cause of TCPIP stack failures?
ASKER
another question...
if this issue is caused a problem on the DC or the fact that the secondary DC is offline, would we not be having problems on our other computers and servers on the domain?
however, our problems seem to be all related to one server that is not a DC. when the problem happens, we can't have new RDP sessions, and sharepoint web front end has connection issues to the backend DB. also, yesterday i had an RDP session established before the problem happened... and as i was troubleshooting the problem, i eventually had an error when i tried to look at Properties of a site in IIS ... the error pertaining to losing connection and asking me if i wanted to reestablish the connection...
i don't think the RDP issue is licensed related. It says the problem is that it can't contact the RPC server. in the past i've seen that as a generic error... basically stating that the computer can't communicate to a network resource (for example, when setting up a domain trust, if there is a firewall between the two DC's, you get an error about not being able to contact the RPC server).
if this issue is caused a problem on the DC or the fact that the secondary DC is offline, would we not be having problems on our other computers and servers on the domain?
however, our problems seem to be all related to one server that is not a DC. when the problem happens, we can't have new RDP sessions, and sharepoint web front end has connection issues to the backend DB. also, yesterday i had an RDP session established before the problem happened... and as i was troubleshooting the problem, i eventually had an error when i tried to look at Properties of a site in IIS ... the error pertaining to losing connection and asking me if i wanted to reestablish the connection...
i don't think the RDP issue is licensed related. It says the problem is that it can't contact the RPC server. in the past i've seen that as a generic error... basically stating that the computer can't communicate to a network resource (for example, when setting up a domain trust, if there is a firewall between the two DC's, you get an error about not being able to contact the RPC server).
ASKER
based on symptoms described here:
http://www.mcse.ms/message1580926.html
i have removed the external DNS from this server so the only DNS entry points to the DC.
will have to wait and see if the problem re-occurs since there doesn't seem to be any pattern as to when it happens. it was happening several times a week, so if it does not re-occur in a week's time, i am fairly sure it can be declared resolved.
so now it's a wait and see...
http://www.mcse.ms/message1580926.html
i have removed the external DNS from this server so the only DNS entry points to the DC.
will have to wait and see if the problem re-occurs since there doesn't seem to be any pattern as to when it happens. it was happening several times a week, so if it does not re-occur in a week's time, i am fairly sure it can be declared resolved.
so now it's a wait and see...
ASKER
oops... wrong link above
http://www.eggheadcafe.com/software/aspnet/31251700/help-exchange-2003-is-lo.aspx
http://www.eggheadcafe.com/software/aspnet/31251700/help-exchange-2003-is-lo.aspx
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ok... removing the external DNS seems to have resolved the problem. we have not had the issue since i removed it, and it was happening every few days, and sometimes several times a day.
that said, i am awarding points to ChiefIT because he has brought up some relevant issues that i need to address.
also, SysExpert's comments about the DC sync are helpful.
that said, i am awarding points to ChiefIT because he has brought up some relevant issues that i need to address.
also, SysExpert's comments about the DC sync are helpful.
2) CHeck the log if DCDIAG and use the admin tools to also check DNS.
It sounds like an authentication issue partially, but it could be a bad TCPIP stack somewhere or a bunch of other network related issues.
I hope this helps !