Exchange 2013 DAG databases failing over nightly

Hello All,

I am having some difficulty with my DAG for exchange 2013. Every morning that I come in I find that my databases have moved over to another server automatically.

We have 6 servers on VMshere that takes snapshots nightly.

3 CAS servers in a CAS array
3 mailbox servers with 4 databases each.

I cannot find any reasoning as to why this is occurring nightly other than the snapshots. Unfortunately, I am not familiar with VSphere or the snapshots that are taken nightly, I do not do this task. I am just responsible for the exchange servers.

There is also no reasoning behind which server gets the database that morning. One morning I will find all the B mailboxes on A, the next possibly all the databases from A moved to B. Sometimes I will even see a combination of databases scattered across all three.

Has anyone seen this behavior before? Why is this automatically happening every night? (snapshots?)

Please let me know what information I can provide to assist with troubleshooting this problem.

Many thanks to the experts exchange community.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Andy MInternal Systems ManagerCommented:
I'm not very experienced with VSphere but the database in Exchange 2013 will only automatically failover if the witness server and secondary email server are unable to get a response from the primary exchange server at any point.

If this is happening every night and based on the information provided I suspect that while the snapshot is running it's effectively pausing the exchange server, preventing any connections and resulting in the failover taking place.

First port of call would be to check the event logs on both exchange servers - when a failover takes place it logs it under the application log. That should give you a better idea of what time the failover actually occurs and if there's any other errors/warnings around the same time.
nyma11Author Commented:

thank you for your prompt response. I can actually see the snapshot times and verify they correlate with the databases failing over.

What is the proper procedure for taking snapshots in VMware on an exchange server?

I do not see the point in taking snapshots of the entire server.

Would it be safe to simply take a snapshot of the passive and lagged databases only? I do not understand why there needs to be a snapshot of the whole server and the active databases as well.

Unless I am missing something, if a failure occurs, we can simply use the passive copy, recreate the server and place them into the recreated server.

Please correct me if I am mistaken.

thanks again for your time.
Hi. What I believe is happing in your environment is that the new feature called Managed Availability is doing the work.
I would suggest to read more about it to understand how it works and then tweak it for your environment.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Andy MInternal Systems ManagerCommented:
Hi Nyma

To be honest I'm not sure on the actual procedure for snapshots as we've never used them in any of the virtualized environments we look after - we generally just use actual backup software to backup the server/exchange database as it gives us greater control over restoring mailboxes in the event of a problem.

Providing the replication between the databases is fine I would assume just take a snapshot of the passive database though personally I would get a second opinion from someone who has more experience with vsphere to make sure.
nyma11Author Commented:
I looked at the documentation you provided and everything appears "healthy" any other ideas?
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.