Exchange 2010 mailbox server crashes when other MB server reboots

Issue: I have two servers in a DAG. When I move all active database copies to server B & reboot server A, all's fine.  When I move all active database copies to server A and reboot server B, all mailbox databases dismount.  They come back online as soon as Server B is back online.  

Environment:
- 2 Exchange 2010 SP3 mailbox servers in a DAG, 2 CAS/HT servers in NLB cluster
- Windows 2008 R2 Enterprise servers
- Running as VMs on two separate Windows 2012 Hyper-V Hosts
- Primary Witness Server is one CAS/HT server, Secondary Witness Server is the other CAS/HT server.

All health checks make it look like everything's in good working order (server health, replication, etc.)
---------------------------------------------
Errors:
Insight Manager (HP utility to monitor server health): [DAG] System is unreachable.
---------------------------------------------
CAS/HT server:

Warning 1022: MSExchange Transport
"The connection between the Client Access server and Mailbox server "[ServerB]" failed...

Microsoft.Exchange.Data.Storage.ConnectionFailedTransientException: Cannot open mailbox [mailboxname]. ---> Microsoft.Mapi.MapiExceptionLogonFailed: MapiExceptionLogonFailed: Unable to make connection to the server. (hr=0x80040111, ec=-2147221231)
Diagnostic context:"
---------------------------------------------
Critical Error 1016: MSExchange ActiveSync

Exchange ActiveSync has encountered repeated failures when it tries to access data on Mailbox server [ServerB]. It will temporarily stop making requests to the Mailbox server for [60] seconds to reduce load on that server. This delay may occur if the Mailbox server is overloaded. If this event is logged frequently, review the Application log on this server and the Mailbox server noted above for other events that could indicate the root cause of performance problems.
---------------------------------------------
Errors on ServerB:

Critical Error 4066: MSExchangeRepl

An error occurred while trying to write to the cluster database. Error: ClusterRegBatchClose failed with error 1726.

---------------------------------------------
Critical error 4082: MSExchangeRepl

The replication network manager encountered an error while monitoring events. Error: Microsoft.Exchange.Cluster.Replay.AmClusterApiException: An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(ServerB) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed.. ---> System.ComponentModel.Win32Exception: There are no more endpoints available from the endpoint mapper
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.DriveMapRefresh()
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.TryDriveMapRefresh()
---------------------------------------------

The DAG was created without issue, although it pre-existed on two physical servers.  We added ServerA to the DAG, retired a physical, added ServerB, retired 2nd physical.

The DAG has a static IP address which pings from both nodes.

Anyone have any ideas?  I'm quite concerned that if ServerA goes down I'm going to be dead in the water.
CHR3800Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Adam BrownSr Solutions ArchitectCommented:
1. With two Nodes, you should only have one Witness server in the configuration. Having 4 results in an even number of votes, which can cause problems.
2. Before rebooting server B, you'll want to make sure that all of the databases are in a healthy state. Run get-mailboxdatabase | get-mailboxdatabasecopystatus to view the status of all copies. If any of the database copies are in a state other than Healthy or Mounted, the database will enter a failed state when the server with the healthy copy fails.
3. Check Cluster services to make sure that each server has a vote in the quorum and that both servers are set as possible owners. This can also cause what you're seeing.
0
CHR3800Author Commented:
Thanks for the response.

My issue ended up being that “The Alternate Witness Server itself does not provide any redundancy for the Witness Server, and DAGs do not dynamically switch witness servers, nor do they automatically start using the Alternate Witness Server in the event of a problem with the Witness Server.”  

So, in the Organization Config I'd defined primary & alternate witness servers, believing that when the primary went down the alternate would take over.  Apparently it doesn't work that way.  So, because I have the primary witness server on the same VM host as one of the mailbox servers, there was no way to establish a quorum when I took both down to patch the host.  The solution for me will be to create a primary witness server on a server that's not part of the Exchange VMs in any way.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Adam BrownSr Solutions ArchitectCommented:
"because I have the primary witness server on the same VM host as one of the mailbox servers" is something you should have mentioned, btw :D
0
CHR3800Author Commented:
I'm accepting my own comment as the solution because it's the right one, which I'd found on my own before having it confirmed by another tech on another site. The one other response wasn't here wasn't helpful
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Server Software

From novice to tech pro — start learning today.