Exchange MBX Database drops

We have recently been seeing issues randomly where our users lose connectivity thru OWA and Outlook 2010/2013 when connecting to our Exchange 2013 environment.

I have been seeing numerous errors such as:

Event ID 2138 - MSExchange ADAccess
Process Microsoft.Exchange.EdgeSyncSvc.exe (PID=9616). Exchange Active Directory Provider received a request to connection to domain controller DC01.domain.local but that domain controller is not available. Use the Ping or PathPing command-line tools to test network connectivity to local domain controllers. Run the Dcdiag command line tool to test domain controller health.

Event ID 10006 - MSExchange Mid-Tier Storage
Active Manager Client experienced an AD timeout trying to lookup object '1f7f97fb-a752-496c-8d32-925ead0ae9ab' in 00:01:00.

Event ID 10006 - MSExchange Mid-Tier Storage      
Active Manager Client experienced an AD timeout trying to lookup object 'DAG01' in 00:01:00.

Event ID 1006 - MSExchangeDiagnostics
The performance counter '\\EXCH05\MSExchangeIS Store(db06)\% RPC Requests' sustained a value of '100.00', for the '10' minute(s) interval starting at '6/9/2015 3:22:00 PM'. Additional information: None. Trigger Name:StorePercentRpcRequestsTrigger. Instance:db06

Where would you recommend troubleshooting steps to begin?

This is an Exchange 2013 Server running on VMware Infrastructure. The MBX databases are presented from our NetApp filers.

Thanks in advance.
Christian HansAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Will SzymkowskiSenior Solution ArchitectCommented:
Based on the errors that you are seeing it looks like AD is the issue. Possibly SRV records that are present that shouldn't be. Have you had any issues recently with your AD domain?

Try running the following commands and post back the results.
- Repadmin /replsum
- Repadmin /showrepl
- Repadmin /bridgeheads
- DCDiag /v

Also check your SRV records in _msdcs.domain.com folder under your domain.com zone.

Will.
0
Jeff RodgersNetworks & Communications Systems ManagerCommented:
Off the hop one question I need to askt... how many domain controllers do you have?   Exchange can absolutely hammer AD.... as the number of mailboxes increases so too does the pounding.   Most of the above errors can be caused by a slow connection to AD.  

The solution would be to add additional Domain Controllers (I tend to make most of my DC's Global Catalogs as well).
0
Christian HansAuthor Commented:
4 out of the 5 DCs we have at this site are Global Catalog servers...

I will run those commands shortly and post an update. Thanks.
0
Making Bulk Changes to Active Directory

Watch this video to see how easy it is to make mass changes to Active Directory from an external text file without using complicated scripts.

Jeff RodgersNetworks & Communications Systems ManagerCommented:
And How many mailboxes?
0
Christian HansAuthor Commented:
Just under 10,000 mailboxes
1 DAG
3 Mailbox Role  servers
6 Mailbox Databases
0
systechadminConsultantCommented:
I think it needs to be taken to Microsoft...and it all started in the month of April. i saw this issue in Two different Prod setups and i was reported the same issue from different organization. Only solution was to reboot the server where event was generating.
0
Jeff RodgersNetworks & Communications Systems ManagerCommented:
That is a sizeable deployment.   Are 10,000 mailboxes being well enough served by the 4 Global Catalogs in the site?

Do you have any performance monitors installed that would tell you what is happening on these servers?   It smacks of performance bottlenecks in its communications with AD.
0
Christian HansAuthor Commented:
Ive opened a ticket with Microsoft support. So far we have been sending logs back and forth to try to determine the cause of this issue. 10,000 is a larger number for 4 global catalog servers but it should be adequate no?

Im not running performance monitors on there right now, but Id like to turn something on for the next 7 days if possible, monitoring the growth naturally. What specific performance monitors should I be looking at?
0
Will SzymkowskiSenior Solution ArchitectCommented:
Ive opened a ticket with Microsoft support. So far we have been sending logs back and forth to try to determine the cause of this issue. 10,000 is a larger number for 4 global catalog servers but it should be adequate no?

Really this does not depend on how many DC/GC's you have it is more about how much resources does this DC/GC have (CPU/Cores/RAM etc).

If you have checked your resources on your domain controllers and they are reasonable (not maxed out) then based on the events you have posted above should not relate to this at all. As i have stated in my first post have you run the commands i have posted? What are the results?

Will.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Exchange

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.