pramod1
asked on
ACTIVE DIRECTORY
what are the results which one should see in dcdiag output
as we one of our exchange server is having rpc failure and it is talking to one of the AD server
as we one of our exchange server is having rpc failure and it is talking to one of the AD server
Should show Success on all tests... if there's a failed test you would need to review it.
ASKER
what tests should I focus on with regard from exchange point of view, like kcc or anything else?
ASKER
the guy just ran this C:\Windows\system32>dcdiag /test:replications? didn't give any result
Exchange is not monitored using DCDIAG. DCDIAG & REPLMON are for domain controllers only.
Running dcdiag /test:replications would display results for connectivity & partition tests.
Other options I use are:
repadmin /showrepl to show replication.
dcdiag /e /c /v
You could copy the below code, save it to a batch file (filename.bat), and run it as admin. It will drop the results as a text file on your desktop.
As for Exchange I usually run either the Best Practice analyzer (Start>>Run>>ExBpa>>ENTER) or from Exchange PowerShell run Get-ServerHealth -server YOUREXSERVER | format-table -autosize > C:\ServerHealth.log (Drops log onto C drive)
YOUREXSERVER would refer to your mail server name.
Running dcdiag /test:replications would display results for connectivity & partition tests.
Other options I use are:
repadmin /showrepl to show replication.
dcdiag /e /c /v
You could copy the below code, save it to a batch file (filename.bat), and run it as admin. It will drop the results as a text file on your desktop.
@echo off
echo Running dcdiag /e /c /v...
dcdiag /v >> %userprofile%\desktop\ad_diag.log
echo Running dcdiag /test:DNS /DNSALL (may take a few moments, be patient)...
dcdiag /test:DNS /DNSALL /e /v >> %userprofile%\desktop\ad_diag.log echo Running dcdiag /test:DcPromo /e /v...
dcdiag /test:DcPromo /e /v >> %userprofile%\desktop\ad_diag.log echo Running dcdiag /test:RegisterInDNS...
dcdiag /test:RegisterInDNS >> %userprofile%\desktop\ad_diag.log echo Running netdiag.exe /v...
netdiag.exe /v >> %userprofile%\desktop\ad_diag.log
echo Running netsh dhcp show server...
netsh dhcp show server >> %userprofile%\desktop\ad_diag.log echo Running repadmin /showreps...
repadmin /showreps >> %userprofile%\desktop\ad_diag.log
echo Running repadmin /replsum /errorsonly...
repadmin /replsum /errorsonly >> %userprofile%\desktop\ad_diag.log echo ...
echo Diagnostic Completed Successfully...
echo view results in %userprofile%\desktop\ad_diag.log
pause
echo ...
echo ...
echo ...
echo General Health Ratio (FAILS/PASSES)
echo This is very general, check %userprofile%\desktop\ad_diag.log for info
echo ...
echo NUMBER OF FAILS
find /c /i "fail" %userprofile%\desktop\ad_diag.log
echo ...
echo NUMBER OF PASSES
find /c /i "pass" %userprofile%\desktop\ad_diag.log
pause
As for Exchange I usually run either the Best Practice analyzer (Start>>Run>>ExBpa>>ENTER) or from Exchange PowerShell run Get-ServerHealth -server YOUREXSERVER | format-table -autosize > C:\ServerHealth.log (Drops log onto C drive)
YOUREXSERVER would refer to your mail server name.
ASKER
dcdiag /test:replications :
Running partition tests on : ForestDnsZones
Running partition tests on : DomainDnsZones
Running partition tests on : Schema
Running partition tests on : Configuration
does it show more results? or should I rundcdiag /e /c /v also? is it relevant
Running partition tests on : ForestDnsZones
Running partition tests on : DomainDnsZones
Running partition tests on : Schema
Running partition tests on : Configuration
does it show more results? or should I rundcdiag /e /c /v also? is it relevant
ASKER
he didn't post any further result.should I have expected more from dcdiag /test:replications
What are you trying to achieve? Depending on what you are testing for it could or could not be relevant.
If you merely want to know whether partition & connection tests are ok then running the relication test should be ok.
In addition to the dcdiag /test:replications I would run the repadmin /showrepl command.
This is provided that you are checking AD Replication health etc.
The dcdiag /test:replications usually looks like this (I blacked out some info not relevant to the results)
The DCDIAG E/ /C /V is a more detailed status of your domain controllers health.
But neither of these options are very relevant to exchange.
Are you able to explain your issue/goal perhaps please?
If you merely want to know whether partition & connection tests are ok then running the relication test should be ok.
In addition to the dcdiag /test:replications I would run the repadmin /showrepl command.
This is provided that you are checking AD Replication health etc.
The dcdiag /test:replications usually looks like this (I blacked out some info not relevant to the results)
The DCDIAG E/ /C /V is a more detailed status of your domain controllers health.
But neither of these options are very relevant to exchange.
Are you able to explain your issue/goal perhaps please?
ASKER
there was RPC failure one day on one of the dag servers and we could not find the root cause so we expect losing contact with preferred DC
Below error
A server-side administrative operation has f ailed. The Microsoft Exchange Replication se
rvice may not be running on server . Specific RPC error message: Er
ror 0x6d9 (There are no more endpoints avail
able from the endpoint mapper) from cli_GetC
opyStatusEx2
Below error
A server-side administrative operation has f ailed. The Microsoft Exchange Replication se
rvice may not be running on server . Specific RPC error message: Er
ror 0x6d9 (There are no more endpoints avail
able from the endpoint mapper) from cli_GetC
opyStatusEx2
Ok in this case there is no need to run dcdiag or replmon as this is an exchange side issue.
Either your service was not running. Check services to see if Exchange Replication is running, and check the system event log for Event ID 7024,7035,7036.
In addition running the Exchange Best Practice Analyzer might add some info on what could be improved on the server.
But besides checking event logs which is what I would review first, the Exchange Powershell Command Get-ServerHealth -server SERVERNAME | format-table -autosize > C:\ServerHealth.log might give you a lot more indication on potential issues.
In addition perhaps the system patched & rebooted.
Sometimes it can trigger an alert if the service had not started yet when the DB tried to mount. I would look for a reboot during the time this error occured. It might be an innocent explanation such as post patching restart.
Either your service was not running. Check services to see if Exchange Replication is running, and check the system event log for Event ID 7024,7035,7036.
In addition running the Exchange Best Practice Analyzer might add some info on what could be improved on the server.
But besides checking event logs which is what I would review first, the Exchange Powershell Command Get-ServerHealth -server SERVERNAME | format-table -autosize > C:\ServerHealth.log might give you a lot more indication on potential issues.
In addition perhaps the system patched & rebooted.
Sometimes it can trigger an alert if the service had not started yet when the DB tried to mount. I would look for a reboot during the time this error occured. It might be an innocent explanation such as post patching restart.
ASKER
Get-ServerHealth -server SERVERNAME | format-table -autosize > C:\ServerHealth.log
get-server health it says not recognized as commnandlet I am running on exchange 2010 DAG
get-server health it says not recognized as commnandlet I am running on exchange 2010 DAG
RIght my bad. That indeed does not work for older Exchange Versions. I had assumed it was Exchange 2013 or upward.
Check the event logs if not done so already. In addition run the Exchange Best Practice Analyzer which might display some bottlenecks.
Check the event logs if not done so already. In addition run the Exchange Best Practice Analyzer which might display some bottlenecks.
ASKER
no we never patched that server
ASKER
I don't see any events except 7035 which is passed, can it be related I/o error
ASKER
it was 1 week before
ASKER
there is another hub server where winrm went down
Auch. You probably do want to patch it. There are a gazillion high risk vulnerabilities out there which you are susceptible to without them..
In addition there are separate Cumulative Updates for Exchange to improve stability, reliability & performance besides bug fixes.
The latest is CU18 here
Another important one is the May 2017 Rollup to protect against the Petya Ransomware.
This might not be relevant to your issue right now (Security patching) but the CU18 might potentially be applicable as there are some improvements throughout the rollups which impact's area's reported by the error. But for now check the event logs and report back if it reports any reboots or service shutdowns.
Also when did you last reboot if you do not patch it?
And what is the service reported by Event ID 7035 please?
In addition there are separate Cumulative Updates for Exchange to improve stability, reliability & performance besides bug fixes.
The latest is CU18 here
Another important one is the May 2017 Rollup to protect against the Petya Ransomware.
This might not be relevant to your issue right now (Security patching) but the CU18 might potentially be applicable as there are some improvements throughout the rollups which impact's area's reported by the error. But for now check the event logs and report back if it reports any reboots or service shutdowns.
Also when did you last reboot if you do not patch it?
And what is the service reported by Event ID 7035 please?
ASKER
it passed
Surprised. Not often that all Best Practice Checks pass. Must be fairly well maintained :)
So the system has not rebooted recently, there are no event ID's showing problems with the exchange services, best practice is clean.
In this case the only thing left really would be to monitor for recurrences and patch the server. The error is server side related so unless it lost connection to the DAG which you might be able to check through cluster management, there is not much to go on besides writing it off as a hiccup and monitoring for recurrences unless others have more suggestions to add.
So the system has not rebooted recently, there are no event ID's showing problems with the exchange services, best practice is clean.
In this case the only thing left really would be to monitor for recurrences and patch the server. The error is server side related so unless it lost connection to the DAG which you might be able to check through cluster management, there is not much to go on besides writing it off as a hiccup and monitoring for recurrences unless others have more suggestions to add.
ASKER
I see one day before that hub server lost connection with mail gateway due to networking issue , can this be cause of rpc and winrim going down
If RPC goes down, winrm won't respond. But for RPC to go down one would assume that there has to be an Event ID with an error under either system or applications during the time where it was unavailable. Sometimes it could be that AntiVirus is doing a scan, and not all exchange components have been properly excluded from the scan. Other typical causes are reboots, network congestion and server running out of resources. But with the event logs pretty clean from my understanding it will be near impossible to trace the cause and leave us to do a lot of guess work.
You might want to supply some more data.
e.g. Are you running on VMWare or Hyper-V, and if yes which version.
What version and patch level of Windows are you running.
Does Exchange 2010 have any service packs applied (Assuming no as it was never patched)
Did any other servers report issues during the same time frame.
Is this a repeat issue or a one off etc.
You might want to supply some more data.
e.g. Are you running on VMWare or Hyper-V, and if yes which version.
What version and patch level of Windows are you running.
Does Exchange 2010 have any service packs applied (Assuming no as it was never patched)
Did any other servers report issues during the same time frame.
Is this a repeat issue or a one off etc.
ASKER
rpc went down on mailbox server on same day and winrm went down on another hub transport server on same day
ASKER
Exchange Server 2010
Microsoft Corporation
Version: 14.03.0361.001
Microsoft Corporation
Version: 14.03.0361.001
ASKER
one time issue, VMware version 5.5
ASKER
The Microsoft Exchange Replication service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 5000 milliseconds: Restart the service. (7031) it says on mailbox server
Ok thats something. What other events occurred around that time frame? Anything leading up to it? There might be informative events or other warnings & alerts providing some insight.
ASKER
resource exhaustion detector- low virtual memory, search indexer stopped unexpectedly,The Microsoft Search (Exchange) service terminated unexpectedly. It has done this 7 time(s).these were just before the rpc
ASKER
The following programs consumed the most virtual memory: store.exe (13700) consumed 38865256448 bytes, svchost.exe (2000) consumed 1129340928 bytes, and w3wp.exe (10944) consumed 848666624 bytes.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
so can this issue be because of memory? rpc terminated?
ASKER
where can I check cu 18 update ?
ASKER
resource exhaustion detector- low virtual memory, search indexer stopped unexpectedly,The Microsoft Search (Exchange) service terminated unexpectedly. It has done this 7 time(s).these were just before the rpc
any reasons why resource exhaustion detector occurred?
any reasons why resource exhaustion detector occurred?
CU18 can be obtained here. Will potentially take a few hours to apply.
To check your current version start Exchange Powershell and enter the command Get-Command ExSetup | ForEach {$_.FileVersionInfo} the results will give you a build number which you can compare here to see what patch level you are on.
And yes, not having enough resources to run all the required on Exchange can result in RPC errors.
Applying the CU18 might alleviate pressure to lower memory consumption, as there have been various improvements on exchange since its release. Also applying Exchange 2010 Service Pack 3 if not done so yet is recommended. (Plus all critical and security patches for security reasons)
Before making any changes always make sure you got a proper backup of course.
As for memory. If still having issues post patching you can try to increase virtual memory first. But the performance will not be as good as increasing physical (or virtually assigned if using VMWare/Hyper-V/Acropolis) system memory
In addition make sure that your antivirus is setup with the proper exclusions for Exchange 2010.
Exclusion references can be located on the below url.
https://technet.microsoft. com/en-us/ library/bb 332342(v=e xchg.141). aspx
Hope all this helps.
Good luck.
To check your current version start Exchange Powershell and enter the command Get-Command ExSetup | ForEach {$_.FileVersionInfo} the results will give you a build number which you can compare here to see what patch level you are on.
And yes, not having enough resources to run all the required on Exchange can result in RPC errors.
Applying the CU18 might alleviate pressure to lower memory consumption, as there have been various improvements on exchange since its release. Also applying Exchange 2010 Service Pack 3 if not done so yet is recommended. (Plus all critical and security patches for security reasons)
Before making any changes always make sure you got a proper backup of course.
As for memory. If still having issues post patching you can try to increase virtual memory first. But the performance will not be as good as increasing physical (or virtually assigned if using VMWare/Hyper-V/Acropolis) system memory
In addition make sure that your antivirus is setup with the proper exclusions for Exchange 2010.
Exclusion references can be located on the below url.
https://technet.microsoft.
Hope all this helps.
Good luck.
ASKER
but any specific reason why resource exhaustion detector occured repeatedly one day and then after rebooting didnt show up again
If the system does not reboot frequently, or if it was handling a large bulk email it might have had trouble keeping up.
It all depends on what the system was doing on that point in time, but being in the past it won't be obvious as you generally need to catch the issue as it occurs.
It all depends on what the system was doing on that point in time, but being in the past it won't be obvious as you generally need to catch the issue as it occurs.